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"If we take in our hand any volume... let us 
ask, Does it contain any abstract reasoning concerning 
quantity or number? Хо. Does it contain any experi- 
mental reasoning concerning matter of fact and existence? 
No. Commit it then to the flames: for it can 
contain nothing but sophistry and illusion!” 


Hume, David, An Enquiry Concerning Human 
Understanding, (1777). 
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INTRODUCTION 
by R. S. Woodworth 


* 


Modern problems and needs are forcing statistical methods and 
statistical ideas more and more to the fore. There are so many things 
we wish to know which cannot be discovered by a single observation, 
or by а single measurement. We wish to envisage the behavior of à 
man who, like all men, is rather а variable quantity, and must be 
observed repeatedly and not once for all. We wish to study the social 
group, eomposed of individuals differing one from another. We 
should like to be able to compare one group with another, one race 
with another, as well as one individual with another individual, or 
the individual with the norm for his age, race or class. We wish to 
trace the curve which pictures the growth of a child, or of a popula- 
tion. We wish to disentangle the interwoven factors of heredity and 
environment which influence the development of the individual, and 
to measure the similarly interwoven effects of laws, social customs 
and economie conditions upon public health, safety and welfare 
generally. Even if our statistical appetite is far from keen, we all 
of us should like to know enough to understand, or to withstand, the 
statistics that are constantly being thrown at us in print or conversa- 
tion—much of it pretty bad statistics. The only cure for bad statis- 
tics is apparently more and better statistics. АП in all, it certainly 
appears that the rudiments of sound statistical sense are coming to 
be an essential of a liberal education. 

Now there are different orders of statisticians. There is, first in 
order, the mathematician who invents the method for performing a 
certain type of statistical job. His interest, as a mathematician, is 
not in the educational, social or psychological problems just alluded 
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to, but in the problem of devising instruments for handling such 
matters. He is the tool-maker of the statistical industry, and one 
good tool-maker can supply many skilled workers, The latter are 
quite another order of statisticians, Supply them with the mathe- 
matician’s formulas, map out the procedure for them to follow, pro- 
vide working charts, tables and calculating machines, and they will 
compute from your data the necessary averages, probable errors and 
correlation coefficients. Their interest, as computers, lies in the quick 
and accurate handling of the tools of the trade. But there is a statis- 
tician of yet another order, in between the other two. His primary 
interest is psychological, perhaps, or it may be educational. Tt is he 
who has selected the scientific or practical problem, who has organ- 


intended primarily for statisticians of the last-mentioned type. It 
lays out before him the tools of the trade; it explains very fully and 
carefully the manner of handling each tool; it affords practice in 


adapted to the student’s use. То an unusual degree, it succeeds in 
meeting the student upon his own ground. 


CoLUMBIA UxivERsITY 
(1926) 


PREFACE 
to the Fourth Edition 


ж 


Perhaps the author who revises an elementary textbook in statis- 
tical method is always tempted to add new material and to eliminate 
old. If he has tried to keep up even approximately with the field, he 
will have encountered new techniques which will seem to him impor- 
tant and worth including in a new text. Furthermore, if he has taught 
the beginning course in statisties for niany years, elementary (but 
perhaps fundamental) procedures may, through sheer repetition, 
have become so simple and routine as no longer to be considered 
worthy of attention. Undoubtedly either or both of these attitudes 
сап work to the disadvantage of the revised book, not to mention the 
beginning student. The addition of extensive new materials may 
easily make a book almost unusable to à beginner—especially if the 
added material is of an advanced nature and not too well integrated 
with the rest of the text. And the toning down or elimination of 
necessary preliminary methods neglects the fact that each new gen- 
eration of students begins from scratch and that things simple to the 
instructor are not always equally simple to the student. 

In preparing the present (fourth) edition of this book I have tried 
to avoid the pitfalls of overextension as well as of underemphasis. 
My purpose is the same as it was in 1926 when the first edition of 
this book was written, namely, to present the fundamentals of statis- 
tical method most useful to students in psychology and education. 
In accordance with this plan, I have not included highly specialized 
techniques (factor analysis, psychophysical methods, curve fitting) , 
nor methods which are applicable mainly to test construction, item 
analysis and the like. It is my experience that specialized as well as 
advanced topics belong in courses designed to follow the elementary 
course, 
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Chapters dealing with reliability and inference have been com- 
pletely rewritten and several obsolete and marginally useful tech- 
niques dropped out. One new chapter (Chapter 10) dealing with 
analysis of variance has been included for those who wish to intro- 
duce this topic in the first course. For the convenience of the in- 
structor the present edition has been divided into three parts. Part I 
(Descriptive Statistics) includes Chapters 1-6; Part IL (Prediction 
and Inference), Chapters 7-11; and Part III (Special Topics), Chap- 
ters 12-16. More than a hundred and fifty examples with answers 
will be found at the ends of the chapters. 

Although the present edition is about thirty pages shorter 
than the earlier, I suspect that it still contains too much material 
for the usual beginning course. In a short course—one semester or 
summer session—I suggest that the instructor concentrate on Part I, 
as I doubt if he can cover more. If the course extends over a year or 
meets several times a week, I would add to Part I Chapters 7, 8, 9, 
12 and 13. Also, if time permits, I would teach Chapters 10, 11, 14 
and 15, or assign them as outside work to the better students. Chap- 
ter 16 is supplementary to €hapter 15 and is intended to be used 
mainly for reference. 

Many teachers who have used this book in the past have been kind 
enough to offer suggestions looking to its improvement, To all of 
these go my sincere thanks even though I have not been able in 
every case to follow their advice. I am indebted to Dr. Lincoln Е, 
Moses for a critical reading of Chapters 8, 9 and 10. 


Henry E. Garrerr 
Согомвіл Universrry 
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THE FREQUENCY DISTRIBUTION 
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1. Measures іп General 


1. What is meant by measurement 


Тһе measurement of individuals and objects may be of various 
kinds, and may be taken to varying degrees of precision. When 
individuals or things have been ranked or arranged in a series with 
respect to some attribute or trait, we have perhaps the simplest sort 
of measurement. Children may be put in order for height, weight, or 
regularity of school attendance; salesmen may be ranked for years 
of experience, or amount of sales over a year; advertisements or pic- 
tures may be ranked for amount of color, or for cost, or for sales 
appeal. Rank order tells us serial position in the group but it does 
not give us a measurement. We cannot add or subtract ranks as we 
can inches or pounds since a person's rank is always relative to the 
ranks of other members of his group, and is never absolute, i.e., in 
terms of some known unit. -” 

Measurements of individuals may also be expressed as scores. 
Scores are usually given in terms of time taken to complete a task, 
or amount done in a given time; less often scores are expressed in 
terms of difficulty of the task performed, or excellence of the final 
result. Seores vary with performance, although score-changes rarely 
parallel performance-changes exactly. When scores are expressed 
in equal units, they constitute a scale. Scaled tests in psychology and 
edueation have equal units or steps but do not possess an absolute 
zero point. On the other hand, the "c.g.s. scales" (centimeters, grams, 
seconds) of physies do have equal units and an absolute zero point. 
“Scores” from physical scales are called measures; they may be 
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added or subtracted and a "score" of twenty inches, say, is twice а 
"score" of ten inches. Sealed scores from mental tests may also be 
added or subtracted just as we add and subtract inches. But we 
cannot say that a score of 40 achieved on а test is twice as good as a 
score of 20, since neither is measured from a zero point of just no 
ability. Traits and other characteristics, measurements of which are 
expressible as scores, are known generally as variables. 


2. Continuous and discrete series 


In the measurement of mental and social traits, most of the vari- 
ables with which we deal fall into continuous series. A continuous 


gaps occur in a truly continuous series, these are to be attributed to 
а failure to measure enough cases, to the relative crudity of the 
measuring instrument, or to some other factor of a like Sort, rather 
than to the lack of measures within the gaps. 


handling continuous data. 


Іп the following sections we shall define more i i 
precisely what is 

meant by а score and shall then show how scores may Be ыды 
: into what is called а frequency distribution, 
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3. The meaning of scores in continuous series 


Scores or other numbers in continuous series are to be thought of 
as distances along a continuum, rather than as discrete points. An 
inch is the linear magnitude between two divisions on а foot-rule; 
and, in like manner, a score in a mental test is a unit distance 
between two limits. A score of 150 upon an intelligence examination, 
for example, represents the interval 149.5 up to 150.5. Тһе exact 
midpoint of this score-interval is 150 as shown below. 


Score 150 
150 
149.5 ^ 150.5 
Other scores are to be interpreted in the same way. A score of 8 on 
the Thorndike Handwriting Scale, for instance, includes all values 
from 7.5 up to 85; i.e., any value from a point .5 unit below 8, to 
-5 unit above 8. Hence, 7.7, 8.0, and 8.4 may all be scored 8. An 
interval extending from .5 unit below to .5 unit above the given value 
is the usual mathematical meaning of а single score. 

There is another and somewhat different meaning which a test 
score may have. According to this second view, a score of 150 means 
that an individual has done at least 150 items correctly, but not 151. 
Hence, a score of 150 represents any value between 150 and 151. Any 
fractional value greater than 150, but less than 151, e.g., 150.3 or 
150.8, since it falls within the interval 150—151 is scored simply as 
150. The middle of the score is 150.5. (See below.) 


Score 159 

| 150.5 | 
150 ^ 151 

Both of these ways of defining a scóre are valid and useful. Which 
to use will depend upon the way in which the test is scored and on the 
meaning of the units of measurement employed. If each of ten boys 
is recorded as having a height of sixty-four inches this will ordinarily 
mean that these heights fall between 63.5 and 64.5 inches (middle 
value 64 in.), and not between sixty-four and sixty-five inches 
(middle value 64.5 in.). On the other hand, the ages of twenty-five 
children, all recorded as being nine years old, will most probably lie 
between nine and ten years; will be greater than nine and less than 
ten years (middle value 9.5). But “nine years old" must be taken in 
many studies to mean 8.5 up to 9.5 years with a middle value of nine 
years, The point to remember is that results obtained from treating 
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clearly indicated otherwise. This will be the method followed 
throughout this book. That is, scores of 62 and 231, say, will usually 
mean 61.5 up to 62.5, and 230.5 up to 231.5, and not 62 up to 63, and 
231 up to 232, 


ll. Drawing Up a Frequency Distribution 


1. The classification of measures 


(1) Determination of the range or the interval between the largest 
and smallest scores, The Tange is found by subtracting the smallest, 
from the largest score, 

(2) Decision as to the number and size of the groups to be used in 
classification, The number and size of these class-intervals will 


we are dealing. 


(3) Tabulation of the separate scores within their proper class- 
intervals, 3 

These three prineiples of classification are ilustrated in Table 1. 
The figures in this table represent the Army Alpha Scores earned by 
fifty college men. Since the highest score is 197, and the lowest 142, 
the range (197-142) ік exactly 55. In deciding upon the number of 
classes to be used in grouping, a good general rule is to select by trial 
an interval which will yield not more than twenty nor less than ten 
classes,* 

The number of class-intervals Which a given Tange will yield can 
be determined approximately (within one interval) by dividing the 
range by the interval tentatively Chosen. In the present problem, 55 


* This е must often be broken When the nu; 


mber of scores is Very large or 
very smal ns 
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(the range) divided by 5 (the interval) gives 11, which is one less 
than the actual number of intervals, namely, 12. An interval of three 
units will yield nineteen classes; an interval of ten units, six 
classes. 


TABLE 1 The tabulation of Army Alpha scores made by fifty college 
students 


A 1. The original scores ungrouped 
185 166 176 145 166 191 177 164 171 174 
147 178 176 #142 170 158 171 167 180 178 
173 148 168 187 181 172 165 169 178 184 
175 156 158 187 156 172 162 193 173 183 
* 197 181 151 161 158 172 162 179 188 179 


* Highest score 7 Lowest score 


2. The same fifty scores grouped into а frequency distribution 


а) @ (8) 
Class-Intervals Tallies f(frequency) 
195 up to 200 / 1 

4 « 19i ng e 2 
185 * “ 190 //// 4 
180 “ “185 TAL 5 
175 “ “ 180 HELL IL 8 
170 * “175 FEL HM 10 
165 “ “170 HEL | 6 
160 “ “165 WHT 4 
155 “ “ 160 WI 4 
150 “ “155 // 2 
145 “ “150 /// 3 
140 “ “165 / A1 

N=50 


The tabulation of the separate scores within their class-intervals 
is shown in Table 1. In the first.‘column of this table the class- 
intervals have been listed serially from the smallest score at the 
bottom of the column to the largest score at the top. Each class- 
interval comprises exactly five scores. The first interval “140 up to 
145” begins with score 140 and ends with 144, thus including the 
five scores 140, 141, 142, 143, and 144. The second interval “145 up 
to 150” begins with 145 and ends with 149, i.e., at score 150. The last 
interval “195 up to 200” begins with score 195 and ends at score 200, 
thus including the scores 195, 196, 197, 198, 199. In column (2), 
marked “Tallies,” the separate scores have been listed opposite their 
proper intervals. The first score, 185, is represented by a tally placed 
opposite interval “185 up to 190”; the second score, 147, by a tally 
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placed opposite interval “145 up to 150”; and the third score, 173, by 
a tally placed opposite “170 up to 175." Тһе remaining scores have 
been tabulated in the same way. When all fifty scores have been 
listed, the total number of tallies on each class-interval (ie., the 
frequency) is written in column (3) headed f (frequency). The sum 


ingly, of each successive interval, are multiples of five. A class- 
interval “142 up to 147” is just as good theoretically as a class- 
interval “140 up to 145”; but the second is easier to handle from 
the standpoint of the arithmetic involved, 


2. Methods of describing the limits of the class-intervals in a frequency 
distribution 


© 

Class- Mid- Class- Mid- id- 

Intervals рош / Intervals — poii S ын Мі, f 

195 u to 200 197 1 1945 up to 199.5 197 т 309i 

190 Too 107 1 М5. «1945 193 2 190109 197 2 

со ig аста ои | 
Lm " Ww 5 1 

175 aer 180 177 8 17454“ 179.5 177 8 175-170 182 8 

cite iy 9 ата ету g 

ШШК Г ГЕ 
THE е 5 157 4 155-159 157 4 

150 155 152 2 1495 ч u 1545 159 2 150-1 

145 " “150 147 3 1445 * u 149.5 147 100252 2 

140 “ “16 142 _1 1895 6 « 144.5 142 1 14-140 i 1 

N-50 М = 50 У = 50 
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(B) cover the same distances as in (A), but the upper and lower 
limits of each interval are defined more exactly. We have seen 
(p. 5) that a score of 140 in a continuous series ordinarily means the 
interval 139.5 up to 140.5; and that a score of 144 means 143.5 up to 
144.5. Accordingly, to express precisely the fact that an interval 
begins with 140 and ends with 144, we may write 139.5 (the begin- 
ning of score 140) as the lower limit, and 144.5 (end of score 144 or 
beginning of score 145) as the upper limit of this step. Тһе class- 
intervals іп (C) express the same facts more clearly than іп (А) and 
less exactly than in (B). Thus, “140-144” means that this interval 
begins with score 140 and ends with score 144; but the precise limits 
of the interval are not given. The diagram below will show how 
(A), (B), and (C) are three ways of expressing identically the 
same facts: 


Class-Interval 

140 up to 145 

139.5 up to 144.5 

140-144 
Interval s Interval 
Begins 1 2 3 4 5 Ends 


189.5 140 141 d 143 144 144.6 


For the rapid tabulation of scores within their proper intervals, 
method (C) is to be preferred to (B) or (A). In (A) it is fairly easy, 
even when one is on guard, to let a score of 160, say, slip into the 
interval *155 up to 160," owing simply to the presence of 160 at the 
upper limit of the interval. Method (B) is clumsy and time-consum- 
ing because of the need for writing .5 at the beginning and end of 
every interval. Method (C), while easiest for tabulation, offers the 
difficulty that in later calculations one must constantly remember 
that the expressed class limits are not the actual class limits: that 
interval “140-144” begins at 139.5 (not 140) and ends at 144.5 (not 
144). If this is clearly understood, method (C) is as accurate as (B) 
or (A). It will be generally used throughout this book. 

The scores grouped within a given interval in a frequency distribu- 
tion are assumed to be spread evenly over the entire interval. This 
assumption is made whether the interval is three, five, or ten units. 
If we wish to represent all of the scores within a given interval by 
some single value, the midpoint of the interval is taken to be the 
légical choice, For example, in the interval 175-179 [Table 2, method 
(C)] all eight scores upon this interval are represented by the 
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single value 177, the midpoint of the interval.* Why 177 is the mid- 
point of this interval is shown graphically below: 


Interval Midpoint Interval 
Begins 1 2 41% 4 5 Ends 
174.5 175 176 177 178 179 179,5 

А simple rule for finding the midpoint of an interval is Mid- 


point — lower limit of interval -+ (upper limit — lower limit), In 


our illustration, 174.5 4 0795 — 1745) = 177. Since the interval 


is five units, it follows that the midpoint must be 2.5 units from the 
lower limit of the class, i.e., 174,5 +2.5; or 2.5 units from the upper 
limit of the class, i.e., 179,5 — 2.5. 


sentative of all of the Scores upon a given interval. Referring to 
Table 1, we find that of the ten scores in the class-interval *170 up 
to 175" (midpoint 172), three (170, 171, 171) are below the mid- 
point; three (172, 172, 172) are on the midpoint; and four (173, 173, 


* The same уа] ( 3 KES 4 
when methods (A) and (B) gee is Of course, the midpoint of the interval 
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Measures of central tendency (p. 28) and of variability (p. 43) 
caleulated from data grouped into intervals of five units, say, will 
usually vary slightly from the same measures calculated from these 
data when ungrouped, or when grouped into intervals of, say, three 
or ten units. These variations arise from (1) differences in the size 
of the groups in which the data are classified, and (2) the fact that 
each score within an interval is assigned the value of the middle of 
the interval instead of its actual value. Corrections are sometimes 
applied to the measures of variability to correct the grouping error 
thus introduced. But usually the error which results from grouping 
і so small that it may be neglected in ordinary statistical work. 


IIl. The Graphic Representation of the Frequency 
Distribution 


Aid in analyzing numerical data may often be obtained from a 
graphie or pictorial treatment of the frequency distribution. The 
advertiser has long used graphic methotls because these devices catch 
the eye and hold the attention when the most careful array of statis- 
tical evidence fails to attract notice. For this and other reasons the 
research worker also utilizes the attention-getting power of visual 
presentation; and, at the same time, seeks to translate numerical 
facts—often abstract and difficult of interpretation—into more con- 
crete and understandable form. 

Four methods of representing a frequency distribution graphieally 
are in general use. These methods yield the frequency polygon, the 
histogram, the cumulative frequexcy graph, and the cumulative per- 
centage curve or ogive. Тһе first two graphie devices will be treated 
in the following sections; the second two in Chapter 4. 


I. Graphical representation of data; General principles 


Before considering methods of constructing a frequency polygon 
or histogram, we shall review briefly the simple algebraic principles 
which apply to all graphical representation of data. Graphing or 
plotting is done with reference to two lines or codrdinate axes, the 
one the vertical or Y-axis, the other the horizontal or X-axis. These 
basic lines are perpendicular to each other, the point where they in- 
tersect being called O, or the origin. Figure 1 represents a system of 
codrdinate axes. 
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The origin is the zero point or point of reference for both axes, Dis- 
tances measured along the X-azis to the right of O are called positive, 
distances measured along the X-azis to the left of O negative. In the 


FIG. | A system of coardinate axes 


(++). In the upper left division Ог second quadrant, z is minus and 
у plus (— --). In the lower left, or third quadrant, both т and y are 
negative (— =); while in the lower right or fourth quadrant, z is 
plus and y minus (+-), 

To locate or Plot a point “А” whose coórdin: 
V = 8, we go out from O four units on the Х-атіз, and up from the 
origin three units on the Y-azis, Where the Perpendiculars to these 
points intersect, we locate the Point “A” (see Fig. 1). The point “В,” 
whose coürdinates аге z = —5, and у= —7, is plotted in the third 
quadrant by going left from О along the X. -axis five units, and then 
down seven units, as shown in the figure. In 


. In like manner, any points 
“С” and “р” Whose z and У values are kno 


ates are g= 4, and 
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reference to OY and OX, the coórdinate axes. The distance of a 
point from O on the X-azis is commonly called the abscissa; and the 
distance of the point from О on the Y-azis the ordinate. Тһе abscissa 
of point “D” is +9, and the ordinate, —2. 


2. The frequency polygon 


(1) CONSTRUCTION OF THE FREQUENCY POLYGON 


Figure 2 illustrates the use of the codrdinate system in the соп- 
struction of a frequency polygon. This graph pictures the frequency 
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*Scores 


FIG..2 Frequency polygon plotted from the distribution of fifty Army 
Alpha scores given in Table І, page 5 


distribution of the 50 Army Alpha scores shown in Table 1, page 5. 
The exact limits of the intervals are laid off at regular distances along 
the base line (the X-azis) from the origin; and the frequencies within 
each interval are measured off upon the Y-azis. There is one score on 
the first interval, 140 up to 145 (Table 1, p. 5). To represent this 
Score on the diagram, we go out on the Х-атів to 142, midway be- 
tween 139.5 and 144.5, and count up one Y-unit. The frequency on 
the next interval, 145 up to 150, is three, hence the second point falls 
midway between 144.5 and 149.5, three units above the X-axis. The 
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two scores on interval 150 up to 155, the four scores on 155 up to 160, 
and the frequency on each succeeding interval, are represented in 
every ease by a point the specified number of scores ( Y-units) above 
the X-axis, and midway between the upper and lower limits of the 
interval upon which the f lies. It is important in plotting a frequency 
polygon to remember that the midpoint of an interval is always 
taken to represent the entire interval. The height of the ordinate at 
the midpoint represents all of the scores within the given interval. 
When all of the points have been located, they are joined in regu- 
lar order to give the frequency polygon * shown in Figure 2. In 
order to complete the figure, one interval (134.5 to 139.5) at the low 
end, and one interval (199.5 to 204.5) at the high end of the distribu- 
tion have been included on the X-scale, The frequency on each of 
these intervals is zero at the midpoint; hence by including them we 
begin the frequency polygon one-half interval below the first, and 
end it one-half interval above the last, class-interval on the X-axis. 
In order to give symmetry and balance to a polygon, one must 
exercise care in the selection of unit-distances to represent the inter- 
vals on the X-azis and the frequencies on the Y-axis. А too-long 
X-unit tends to stretch out the polygon, while a too short X-unit 
crowds the separate points. On the other hand, a too-long Y-unit 
exaggerates the changes from interval to interval, and a too-short 
Y-unit makes the polygon too flat. A good general rule is to select 
- and Y-units which will make the height of the figure approxi- 
mately 75% of its width. Тһе ratio of height to width may vary 
from 60-80% and the figure still have good proportions; but it can 
rarely go below 5095 and leave the figure well balanced. The fre- 
quency polygon in Figure 2 illustrates the “75% rule." There are 
thirteen class-intervals laid off on the X-azis—twelve full intervals 
plus one-half interval at the beginning and at the end of the range. 
Hence, our polygon should be 75% of thirteen, or about ten X-axis 
units high. These ten units (each equal to one interval) are laid off 
on the Y-azis. To determine how many scores (Рв) should be as- 
signed to each unit on the Y-axis, we divide 10, the largest f (on 
interval 169.5 up to 174.5) by 10, the number of intervals laid off 
on Y. The result (і.е., 1) shows that each Y-unit is exactly equal to 
one f or score, as shown in Figure 2. 
The polygon in Figure 5, page 18, furnishes another illustration 
of this method of plotting a frequency polygon so as 


ү to preserve 
balance. This polygon represents the distribution of 200 cancellation 


* Polygon means "many-sided figure." 
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scores shown in Table 3. There are ten intervals laid off along the 
base line or X-axis—nine full intervals plus one-half interval at the 
beginning and at the end of the range. Since 7596 of 10 is 7.5, the 
height of our figure could be either seven or eight X-axis units. To 
determine the “best” value for each Y-unit, we divide 52, the largest 
f (on 119.5 up to 123.5) by 7, getting 7%; and then by 8, getting 6.5. 
Using whole numbers for convenience, evidently we may lay off on 
the Y-axis seven units, each representing eight scores; or eight units 
each representing seven scores. The first combination was chosen 
because a unit of eight f's is somewhat easier to handle than one of 
seven. A slightly longer Y-unit representing ten f’s would perhaps 
have been still more convenient. 


TABLE 3 Scores made by 200 adults upon a cancellation test 


Class-Interval = 4 


Class-Intervals Midpoint 
Scores а 7 
135.5 ар {о 139.5 137.5 3 
181.5 “ “ 135.5 133.5 5 
127.5 “ “ 131.5 129.5 16 
123.5 “ “ 127.5 125.5 23 
119.5 “ “ 123.5 121.5 52 
115.5 * “ 119.5 117.5 49 
111.5 * “1155 113.5 27 
107.5 “ “ 111.5 109.5 18 
103.5 * “ 107.5 105.5 T 
М = 200 


The total frequency (М) of a distribution is represented by the 
area of its polygon; that is, the area bounded by the frequency sur- 
face and the X-axis. The area lying above any given interval, how- 
ever, cannot be taken as proportional to the number of cases within 
the interval because of the irregularities in the distribution and con- 
sequently in the frequency surface. To show the positions of the 
mean and the median in the graph, we may locate these measures on 
the X-axis as shown in Figures 2 and 5. Perpendiculars erected at 
these points show the approximate frequency at the mean and at 
the median. 

Steps involved in constructing a frequency polygon may be sum- 
marized as follows: 


(1) Draw two straight lines perpendicular to each other, the vertical line 
near the left side of the paper, the horizontal line near the bottom. 
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Label the vertical line (the Y-azis) OY, and the horizontal line (the 
X-axis) OX. Put the O where the two lines intersect. This point is 
the origin. 

(2) Lay off the intervals of the frequency distribution at regular distances 
along the X-azis. Begin with the lower limit of the interval next below 
the lowest in the distribution, and end with the upper limit of the 
interval next above the highest in the distribution. Label the successive 
X distances with the interval limits. Select an X-unit which will allow 
all of the intervals to be represented easily on the graph paper. 

(3) Mark off on the Y-azis successive units to represent the scores (the 
frequencies) on the different intervals. Choose a Y-seale which will 
make the largest frequency (the height) of the polygon approximately 
7596 of the width of the figure. 

(4) At the midpoint of each interval on the X-azis go up in the Y direction 
а distance equal to the number of scores on the interval. Place points at 
these locations. 

(5) Join the points plotted in (4) with straight lines to give the frequency 
surface. 


(2) SMOOTHING THE FREQUENCY POLYGON 
A Because the sample is small (N — 50) and the frequency distribu- 
tion somewhat irregular, the polygon in Figure 2 tends to be jagged 
in outline. To iron out chance irregularities, and also get a better 
notion of how the figure might look if the data were more numerous, 
the frequeney polygon may be “smoothed” as shown in Figure 3, 
below. In smoothing, a series of "moving" or "running" averages 
are taken from which new or adjusted frequencies are determined. 
The method is illustrated in Figure 3. To find an adjusted or 


Frequencies 
= уы > \л с-з Ф о 


баз eas “Без "йаз аз Ша ЫП 
Scores 
FIG. 3. Original and smoothed frequency polygon. 


The original and 
smoothed f's are given below Жы 
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“smoothed” f, we add together the f on the given interval and the f's 
on the two adjacent intervals (the one just below and the one just 
above) and divide the sum by 3. For example, the smoothed f for 


interval 174.5 up to 179.5 is 52820 ог 7.67; for interval 154.5 


up to 159.5, 1113 or 3.33. Тһе smoothed f's for the other inter- 
vals may be found in the table below Figure 3. To find the smoothed 


(Data from Table 1, p. 5) 


Scores 7 Smoothed f 
0 .33 
195-199 1 1.00 
190-194 2 2.33 
185-189 4 3.67 
180-184 5 5.67 
175-179 8 7.67 
170-174 10 8.00 
165-169 6 6.67 
160-164 4 4.67 
155-159 4 3.33 
150-154 2 3.00 
145-149 3 2.00 
140-144 1 1.33 
135-139 0 ЖЕ; 
50 50.00 


f's for the two intervals at the extremes of the original distribution, 
namely, 139.5 up to 144.5, and 194.5 up to 199.5, a slightly different 
procedure is necessary. Here we add 0, the f on the step below or 
above, the f on the given step, and the f on the adjacent step and di- 
vide by 3. This procedure makes the smoothed f for 139.5 up to 144.5, 


ULM or 1.33, and the smoothed f for 194.5 up to 199.5, Sir 


or 1.00. The smoothed f for the intervals 134.5 up to 139.5 and 199.5 
up to 204.5, for which the frequency in the original distribution is 0, 


is in each case ese or .33. Note that if we omit these two inter- 


vals the N for the smoothed distribution will be less than 50, since 
the smoothed distribution has frequencies outside the range of the 
original distribution. 

If the already smoothed f's in Figure 3 are subjected to а second 
smoothing, the outline of the frequency surface wil become more 
nearly a continuous flowing curve. It is doubtful, however, whether 
80 much adjustment of the original f's is often warranted. When an 
investigator presents only the smoothed frequency polygon and does 
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not give his original data, it is impossible for a reader to tell with 
what he started. Moreover, smoothing gives a picture of what an 
investigator might have gotten (not what he did get) if his'data had 
been more numerous, or less subject to error than they were. If N is 
large, smoothing may not greatly change the shape of a graph, and 
hence is often unnecessary. The frequency polygon in Figure 5, 
page 18, for example, which represents the distribution of 200 can- 
cellation test scores, is quite regular without any adjustment of the 
ordinate (ie. the Y) values. Probably the best course for the be- 
ginner to follow is to smooth data as little as possible. When smooth- 
ing seems to be indicated in order better to bring out the facts, 
one should be careful always to present original data along with 
"adjusted" results. 


V. The histogram or column diagram 


А second way of representing a frequency distribution graphically 
is by means of a histogram or column diagram. This type of graph 
is illustrated in Figure 4, page 17, for the same distribution of scores 
represented by the frequency polygon in Figure 3, page 14. The two 
figures are constructed in much the same way, with this important 
difference: In a frequency polygon all of the scores within a given 
interval are represented by the midpoint of that interval, while in a 
histogram the assumption is made that scores are spread uniformly 
over their intervals. The measures within each interval of a histo- 
gram, therefore, are represented by a rectangle, the base of which 
equals the interval, and the height of which equals the number of 
scores (the f) within the interval." Thus the one score upon interval 
139.5 up to 144.5 is represented by a rectangle whose base equals the 
length of the interval, and whose height equals one unit measured off 
on the Y-axis. The three scores within the next interval, 144.5 up to 
149.5, are represented by a rectangle one interval long and three 
Y-units high. The altitudes of the other rectangles vary with the 
number of f’s upon the intervals, the bases all being one interval 
long. When the same number of scores falls within two or more 
adjacent intervals, as in the intervals 154.5 up to 159.5, and 159.5 
up to 164.5, the top of the rectangle covers two or more intervals on 
the X-axis. The highest rectangle is, of course, that one (on interval 
169.5 up to 174.5) which has 10, the largest frequency, as its altitude. 
In selecting scales for the X- and Y-azes, the same considerations, as 


THE FREQUENCY DISTRIBUTION * 17 


to height and width of figure, outlined on page 12 for the frequency 
polygon, should be observed. 

Although in a histogram each interval is represented by a separate 
rectangle, it is not necessary to project the sides of the rectangles 
to the base line as is done in Figure 4, below. The rise or fall of 
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FIG. 4 Histogram of the fifty Army Alpha scores shown in Table !, 
page 5 


the boundary line shows the increase or: decrease in the number of 
scores from interval to interval and is usually the important fact to 
be brought out (see Fig. 5). As in э frequency polygon, the total fre- 
quency (N) is represented by the area of the histogram. In contrast 
to the frequency polygon, howevers the area of each rectangle in a 
histogram is directly proportional to the number of measures within 
the interval. For this reason, the histogram presents an accurate ріс- 
ture of the relative proportions of the total frequency from interval 
to interval. 

In order to provide а more detailed comparison of the two types of 
frequency graph, the distribution in Table 3, page 13, is plotted 
upon the same coórdinate axes in Figure 5, page 18, as a frequency 
polygon and as a histogram. The increased number of cases and the 
more symmetrical arrangement of scores in the distribution make 
these figures more regular in appearance than those in Figures 2 
and 4. 
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FIG. 5 Frequency polygon and histogram of 200 cancellation scores 
shown in Table 3, page 13 


4. Plotting two frequency distributions on the same axes, when samples 
differ in size 


Table 4 gives the distributions of scores on an achievement exam- 
ination made by two groups, А and B, which differ considerably in 
size. Group A has 60 cases, Group B, 160 cases. If the two distribu- 
tions in Table 4 are plotted as polygons or as histograms on the same 
coürdinate axes, the fact that the f's of Group B are so much larger 
than those of Group А makes it hard to compare directly the range 


TABLE 4 а 
(1) @ (3) (4) (5) 
Achievement GroupA Group В Group A Group B 
mination 8j a Percent- Percent- 
Scores Frequencies Frequencies 
80-89 0 9 0.0 5.6 
70-79 3 12 5.0 7.5 
60-69 10 82 167 20.0 
50-59 16 48 26.7 80.0 
40-49 12 27 20.0 17.0 
30-39 9 20 15.0 12.5 
20-29 6 12 10.0 7.5 
10-19 4 0 6.7 0.0 
60 160 100.1 100.1 
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and quality of achievement in the two groups. A useful device in 
cases where the N’s differ in size is to express both distributions in 
percentage frequencies as shown in Table 4. Both N's are now 100. 
and the f's are comparable from interval to interval. For example, 
we know at once that 26.7% of Group А and 30% of Group B made 
scores of 50 through 59, and that 5% of the A's and 7,5% of the B's 
scored from 70 to 79. Frequency polygons representing the two dis- 
tributions, in which percentage frequencies instead of original f's 
have been plotted on the same axes, are shown in Figure 6. These 
polygons provide an immediate comparison of the relative achieve- 
ment of our two groups not given by polygons plotted from original 
frequencies. 


30 


25 


Percentage Frequencies 


395 49.5 

Scores 
FIG. 6 Frequency polygons of the two distributions in Table 4. Scores 
are laid off on the X-axis, percentage frequencies on the Y-axis 


9 95 195 295 595 695 795 895 995 


Percentage frequencies are readily found by dividing each f by № 
and multiplying by 100. Thus 3/60 X 100 — 5.0. A simple method of 
finding percentage frequencies when à caleulating machine is avail- 
able is to divide 100 by У and, putting this figure in the machine, to 
multiply each f in turn by it. 

For example: 1.667 (ie., 100/60) X 3 = 5.0; 1.667 X 10 = 16.7, 
ебс.; 625 (i.e. 100/160) X9 = 5.6, .625 X 12 = 7.5, «te. What per- 
centage frequencies do, in effect, is to scale each distribution down to 
the same total N of 100, thus permitting a comparison of f’s for 
each interval. 
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5. When to use the frequency polygon and when to use the histogram 


Тһе question of when to use the frequency polygon and when to 
use the histogram cannot be answered by a general rule which will 
cover all cases. The frequency polygon is less exact than the histo- 
gram in that it does not represent accurately, i.e., in terms of area, 
the number of measures within successive intervals. In comparing 
two or more graphs plotted on the same axes, however, the frequency 
polygon is the more useful, since the vertical and horizontal lines in 
the two histograms will often coincide. Both the histogram and the 
frequency polygon tell the same story and both are useful in enabling 
us to show in graphic form whether the scores of a group are dis- 
tributed symmetrically or whether they are piled up at the low or at 
the high end of the scale. Not only information with regard to the 
group, but information with regard to the test, may be secured from 
a graph. If a test is too easy, the scores will crowd the high end of 
the scale; if the test is too hard, the scores will pile up at the low end 
of the scale. If the test is well suited to the group, scores will tend to 
be distributed symmetrically around the mean, a few individuals 
scoring high, a few low, and the majority scoring somewhere near the 
middle of the scale. When this happens, the frequency graph approx- 
imates the “ideal” or normal frequency curve described in Chapter 5. 


IV. Standards of Accuracy in Computation * 


"How many places" to carry numerical results is a question which 
arises persistently in statistieal computation. Sometimes a student, 
by discarding decimals, throws away legitimate data. More often, 
however, he tends to retain too many decimals, a practice which may 
give a false appearance of great precision not always justified by the 
original material. 

In this section are given some of the generally accepted principles 
which apply to statistical calculation. Observance of these rules 
will lead to greater uniformity in calculation, They should be fol- 
lowed carefully in solving the problems given in this book, 


1. Rounded numbers 


` In calculation, numbers are usually "rounded" off to the standard 
of accuracy demanded by the problem. If we round off 8.6354 to two 


* This section should be reviewed frequenily, 


с and refi i i 
problems given in succeeding chapters. чеш шае 


Са ~ 


THE FREQUENCY DISTRIBUTIO! 


decimals it becomes 8.64; to one decimal, 8.6; to the neares 
teger, 9. Measures of central tendency and variability, coefficients 
of correlation, and other measures, are rarely reported to more than 
two decimal places. А mean of 52.6872, for example, is usually 
reported as 52.69; a standard deviation of 12.3841 as 12.38; and a 
coefficient of correlation of .6350 as .63, etc. It is very doubtful 
whether much of the work in mental measurement warrants accuracy 
beyond the second decimal. Convenient rules for rounding numbers 
to two decimals are as follows: When the third decimal is less than 
5, drop it; when greater than 5, increase the preceding figure by 1; 
when exactly 5, compute the fourth decimal and correct back to the 
second place; when exaetly 5 followed by zeros, drop it and make 
no correction. 


2. Significant figures 


Тһе measurement 64.3 inches is assumed to be correct to the near- 
est tenth of an inch, its true value lying somewhere between 64.25 
and 64.35 inches. Two places to the left of the decimal point, and 
one to the right are fixed, and hence 64.3 is said to contain three sig- 
nificant figures. The numbers 643 and .643 also contain three signifi- 
cant figures each. 

In the number .003046 there are four significant figures, 3, 0, 4, 
and 6, the first two zeros serving merely to locate the decimal point. 
When used to locate а decimal point only, a zero is not considered to 
be a significant figure; .004, for example, has only one significant 
figure, the two zeros simply fixing the position of 4, the significant 
digit. The following illustrations, should make clear the matter of 
significant figures: 


e 


Р 136 has three significant figures. 
136,000 has three significant figures also. The true value of this number lies 
between 136,500 and 135,500. Only the first three digits are definitely 
fixed, the zeros serving simply to locate the decimal point or fix the 
size of the number. 
1360. has four significant figures; the decimal indicates that the zero in the 
fourth place is known—and hence significant. 
1196 has three significant figures. 
.1360 has four significant figures; the zero fixes the fourth place. 
.00136 has £hree significant figures; the first two zeros merely locate the 


лањ ; 
e iqq. 
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73. Exact and approximate numbers 


It is necessary in calculation to make a distinction between exact 
and approximate numbers. An exact number is one which is found 
by counting: ten children, 150 test scores, twenty desks are ехаш- 
ples. Approximate numbers result from the measurement of variable 
quantities. Test scores and other measures, for example, are approx- 
imate since they are represented by intervals and not exact points on 
some scale. Thus a score of 61 may be any value from 60.5 up to 
61.5 and a measured height of 47.5 inches may be any value from 
47.45 up to 47.55 inches (see p. 3). Caleulations with exact num- 
bers may, in general, be carried to as many decimals as we please, 
since we may assume as many significant figures as we wish. For 
example, 110 test scores, which means that exactly 110 subjects were 
tested, could be written № = 110.000 . . . ie, ton significant figures. 
Calculations based upon approximate numbers depend upon, and 
are limited by, the number of significant figures in the numbers which 


enter into the calculations. This will be made clearer in the follow- 
ing rules: 


4. Rules for computation 


(1) ACCURACY OF A PRODUCT 


(a) The number of significant figures in the product of two or 
more approximate numbers will equal the number of significant fig- 
ures in that one of the numbers which is the least accurate, i.e., which 
contains the smallest number of significant figures. To illustrate: 


125.5 Х 7.0 = 880, not 878.5, because 7.0, the less accurate of the two 
А numbers, contains only two significant figures. The number 
125.5 contains four significant figures. 
125.5 X< 7.000 = 878.5. Both numbers now contain four significant figures; 
hence their product also contains four significant figures. 


(b) When multiplying an exact number by an approximate num- 
ber, the number of significant figures in the produet is determined 


by the number of significant figures in the approximate number, To 
illustrate: 


1f each of 12 children (12 is an exac 


Н t number) has ап М.А. of 8 years 
(8 is ап approximate number) the р 


о roduct 12 X 8 must be written either 
, . 8890 or 100, since the approximate number has only one significant digit. 
~ If, however, each М.А. of 8 years сап be written as 8.0, the product 
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128.0 сап be written as 96, since 80 contains two significant 
digits. 


(2) ACCURACY OF A QUOTIENT 
(a) When dividing one approximate number by another approxi- 
mate number, the significant figures in the quotient will equal the 
significant figures in that one of the two numbers (dividend or divi- 
sor) which is less accurate, ie., which has the smaller number of 
significant digits. Illustrations: 


9.27 should be written .23, not .22609, since 41 (the less accurate number) 
41 contains only two significant figures. 
.16 should be written .0034, not .0033869, since 16 (the less accurate 
4724 number) has two significant figures. 
(b) In dividing an approximate number by an exact number, the 
number of significant figures in the quotient will equal the number 
of significant figures in the approximate number. Illustrations: 


9.27 should be written 226, since 9.27, the approximate number, has three 
4] significant figures. The number 41 is an exact number. 
8541 should be written 170.8, not 170.82 since 8541, the approximate num- 
50 ber, contains only four significant figures. 
(c) In dealing with exact numbers, quotients may be written to 
as many decimals as one wishes. 


(3) ACCURACY OF А ROOT OR POWER 

(a) The square root of an approximate number can contain no 
more significant figures than there are in the number itself. The 
number of significant figures retairfed in a square root is usually less 
than (often one-half) the number of significant figures in the number. 
For example, \/159.5600 is usually*written 12.63, and not 12.63176, 
although the original number, 159.5600, contains seven significant 
figures. 

(b) The square, or higher power, of an approximate number con- 
tains as many significant figures as there are in the original number 
(and no more). For example, (.034)? = .0012 (two significant fig- 
ures) and not ,001156 (four significant figures). 

(c) Roots and powers of exact numbers may be taken to as many 
decimal places as one wishes. 


(4) ACCURACY OF A SUM OR DIFFERENCE 
The number of decimal places to be retained in a sum or difference 


24 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


should be no greater than the number of decimals in the least accu- 
rate of the numbers added or substracted. Illustrations: 


362,2 + 18.225 + 5.3062 = 385.7 not 385.7312, since the least accurate 
number (362.2) contains only one decimal 
362.2 — 18.245 = 344.0, not 343.955, since the less accurate num- 
ber (362.2) contains only one decimal. 


PROBLEMS 


1. Indicate which of the following variables fall into continuous and which 
into discrete series: (a) time; (b) salaries in a large business firm; (с) 
sizes of elementary school classes; (d) age; (e) census data; (f) distance 
traveled by car; (g) football scores; (л) weight; (i) numbers of pages 
in 100 books; (j) mental ages. 

2. Write the exact upper and lower limits of the following scores in ac- 
cordance with the two definitions of a score in continuous series, given 
on pages 3 and 4: 


62 175 1 
8 312 87 


8. Suppose that sets of scores have the ranges given below. Indicate how 
large an interval, and how many intervals, you would suggest for use in 
drawing up a frequency distribution of each set. 


Range Size of Interval Number of Intervals 
16 to 87 

0 to 46 
110 to 212 
63 to 151 

4 to 12 


4. In each of the following write (a) the exact lower and upper limits of 
the class-intervals (following the first definition of a score, given on page 
8), and (5) the midpoint of each interval. 
45-47 162.5-167.5 63-67 0-9 
1-4 80 up to 90 16-17 25-28 
(a) Tabulate the following twenty-five scores into two frequency dis- 


* tributions, using (1) an interval of three, and (2) an interval of five 
units. Let the first interval begin with score 60. 


-72 75 -77 -67 -72 
US “78 -65 ~ 86 :73 
-67 -82 -76 76 :70 
-83 “тл С 72 ‚72 


“бі .67 .84 169 464 
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(b) Тһе following 100 scores were made on the Thorndike Intelligence 
Examination for High School Graduates by applicants for admis- 
sion to college. Tabulate these scores into three frequency distribu- 
tions, using class-intervals of three, five, and ten units. Let the first 
interval begin with score 45. 


Уз 78 76 58 95 
78 86 80 96 94 
46 78 92 86 88 
82 101 102 70 50 
14 65 73 72 91 

103 90 87 74 83 
78 75 70 84 98 
86 73 85 99 93 

103 90 70 81 83 
87 86 93 50 76 
73 86 82 71 94 
95 84 90 73 75 
82 86 83 63 56 
80 76 81 105 73 
73 75 85 74 95 
92 83 72 98 110 
85 103 81 78 98 
80 86 96 78 71 
81 84 81 83 92 
90 85 85 96 72 


6. Тһе following lists represent the final grades made by two sections of the 
same course in general psychology. 


(a) Tabulate the grades into frequency distributions using an interva) 
of 5. Begin with 45 in Section f and 50 in Section II. 
(b) Represent these frequency distributions as frequency polygons on 


x the same axes. е 
Section I (М = 64) X Section II (М = 46) 
70 1 67 90 51 70 90 84 73 78 58 84 
ӨП 79) 81.481 B8 7. 72 80 74 86 52 74 
51° 76 76 90 71 72 62 90 87 92 78 62 
89 90 76 71 88 66 81 82 76 85 85 90 
91 71 65 68 65 76 84 79 54 94 81 
79 80 71 76 54 80 70 97 65 66 77 
72 63 87 91 90 45 80 69 56 57 
69 66 80 79 71 75 47:98: 717168 
58 50/ 47: 67 67 52 62 95 65 ТІ 


64 8: 54 70 80 92 М9: ©8575 70 7071 
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7. (a) Plot frequency polygons for the two distributions of 25 scores found 
in 5(a), using intervals of 3 and of 5 score units. Smooth both dis- 
tributions (see p. 14) and plot the smoothed f's and the original 
scores on the same axes, 

(b) Plot a frequency polygon of the 100 scores in 5(b) using an interval 
of 10 score units. Superimpose a histogram upon the frequency 
polygon. 

(c) On the same axes, plot a frequency polygon and histogram of the 
100 Thorndike scores using an interval of 5 score units. Smooth the 
frequency polygon and plot on the same diagram. 


8. Reduce the distributions А and B below to percentage frequencies and 
plot them as frequeney polygons on the same axes. Is your understand- 


ing of the achievement of these groups advanced by this treatment of 
the data? 


Scores Group А Group B 
52-55 1 8 
48-51 0 5 
44-47 5 12 
40-43 10 58 
36-39 20 40 
32-35 12 22 
28-31 8 10 
24-27 2 15 
20-23 3 5 
16-19 4 0 
65 175 

9 (а) Round off the following numbers to two decimals: 

3.5872 74.168 126.83500 
46.9223 25.193 81.72558 

(b) How many Significant figurés in each of the following: 
100046 91.00 1.03 

46.02 18.365 15.0048 


(c) Write the answers to the following: 
127.4 X .0036 = (both numbers approximate) 
200.0+5.63= “ 
62 X .053 = (first number exact, second approximate) 

364.2 + 61.596 = 
364.2 — 61.596 = 

VAT36 = 

(18.6)? = 


THE FREQUENCY DISTRIBUTION * 27 


ANSWERS 


2. 61.5 to 62.5 and 62.0 to 63.0; 174.5 to 175.5 and 175.0 to 176.0; 
7.5 іо 8.5and 8.0% 9.0; 311.5 to 312.5 and 312.0 to 313.0; 
5 to 1.5 and 1.0 to 2.0 
86.5 to 87.5 and 87.0 to 88.0 


3. Size of Interval No. of Intervals 
5 15 
З ог40г5 16 ог 12 ог 10 
10 1 
5 or 10 18 ог9 
1 9 
4. Midpoint 
445to 47.5 46.0 
Біо 45 25 
162.5 to 167.5 165.0 
79.5to 89.5 84.5 
62.5to 67.5 65.0 
15.5to 17.5 16.5 
—5to 95 45 
24510 285 26.5 
9. (а) 3.59 7447 126.83 
46.92 25.19 81.73 
(b) 2 4 3 
4 5 6 
(c) 46 
35.5 
83 
425.8 5 
802.6 
6.918 ог 6.92 " 


346 
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MEASURES OF CENTRAL TENDENCY 


* 


When seores or other measures have been tabulated into a 
frequency distribution, as shown in Chapter 1, usually the next task 
is to caleulate one or more measures of central tendency. 'The value 
of a measure of central tendeney is twofold. First, it is а single 
measure which represents all of the scores made by the group, and 
as such gives a concise description of the performance of the group 
as а whole; and second, it enables us to compare two or more groups 
in terms of typical performance. There are three “averages” or 
measures of central tendeney in common use, (1) the arithmetic 
mean, (2) the median, and (3) the mode. Popularly, the average is 
used for the arithmetie mean. In statistical work, however, average 
is often used as a general term for any measure of central tendency. 


1. Calculation of Measures of Central Tendency 
1. The arithmetic mean or "average" (M) 


(1) CALCULATION OF THE MEAN WHEN DATA ARE UNGROUPED 

The arithemetic mean or simply the mean is the best known meas- 
ure of central tendency. It may be defined as the sum of the separate 
scores or other measures divided by their number. То illustrate: if à 
man earns $3, $4, $3.50, $5, and $4.50 on five successive days his 
mean daily wage (84.00) is obtained by dividing the sum of his daily 
earnings by the number of days he has worked. The formula for the 
arithmetie mean (M) of a series of ungrouped measures is 


EX 
м-2< 
= (1) 
(arithmetic mean calculated from ungrouped data) 
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in whieh N is the number of measures in the series, X stands for a 
score or other measure, and the symbol X means "sum of," here sum 
of scores. 


(2) CALCULATION OF THE MEAN FROM DATA GROUPED INTO А FRE- 
QUENCY DISTRIBUTION 

When measures have been grouped into a frequency distribution, 
the arithmetic mean is caleulated by a slightly different method from 
the one given above. The two illustrations given in Table 5, page 30, 
will make the differences clear. The first example shows the calcula- 
tion of the mean of the 50 Army Alpha scores which were tabulated 
into a frequency distribution in Table 1. First caleulate the fX col- 
umn by multiplying the midpoint (X) of each interval by the num- 
ber of scores (f) on it; the mean (170.80) is then simply the sum of 
the /Х (namely, 8450) divided by N (50). The use of the midpoint 
for all of the scores within an interval is made necessary by the fact 
that scores grouped into intervals lose their identity and must there- 
after be represented by the midpoint of that particular interval in 
which they fall. Hence, we multiply the midpoint of each interval 
by the frequency upon that interval; add the fX and divide by N 
to obtain the mean. The formula may be written 


EfX 
N 
(arithmetic mean calculated from scores grouped into а frequency 
distribution) 


The second example in Table 5 és another illustration of the cal- 
culation of the mean from grouped data. This frequency distribution 
represents 200 scores made by a greup of adults upon a cancellation 
test. Scores have been classified by method (B), page 6, into 9 
class-intervals; and since the intervals are 4 units, the midpoints are 
found by adding one-half of 4 to the lower limit of each. For exam- 
ple, in the first interval, 103.5 + 2.0 = 105.5. The JX column totals 
23,888.0; and N equals 200. Hence, applying formula (2), the arith- 
metic mean is found to be 119.44 (to two decimals). 

In both of the illustrations in Table 5, the M of the scores made 
by the members of a group was found. We may, however, use either 
formula (1) or (2) to calculate the M of a number of measurements 
made upon the same individual. If an individual’s reaction time to 
light is measured 100 times, and the measures tabulated into a fre- 
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TABLE 5 The calculation of the mean, median, and crude mode from 
data grouped into a frequency distribution 


1. Data from Table 1, fifty Army Alpha scores 
Class-interval = 5 


Class- „ко 
Intervals Марад 7. fX 

Scores 

195-199 197 1 197 
190-194 192 2 384 
185-189 187 4 | 748 
180-184 182 5 910 
175-179 177 8 20 1416 
170-174 172 10 1720 
165-169 167 6 20 1002 
160-164 162 4 648 
155-159 157 4 | 628 
150-154 152 2 304 
145-149 147 3 441 
140-144 142 nr 142 
"e N = 50 8540 

N/2 = 25 


_ Ух _ 8540 _ 
(1) Mean = ENG ir EO 170.80 


(2) Median = 169.5 + 15 X 5 = 172.00 
(8) Crude Mode falls on class-interval 170-174 or at 172.00 


2. Scores made n. 200 adults upon a cancellation test 


lass-interval — 4 
Class-Intervals Midpoint 
Scores Pd f Jx 

135.5 to 139.5 137.5 3 412.5 
131.5 to 135.5 133.5 5 667.5 
127.5 to 131.5 129.5 , 16 | 2072.0 
123.5 to 127.5 125.5 23 2886.5 
119.5 to 123.5 121.5 52 99 6318.0 
115.5 to 119.5 117.5 » 49 5757.5 
111.5 to 115.5 113.5 27 52 3064.5 
107.5 to 111.5 109.5 18 1971.0 
108.5 to 107.5 105.5 7 738.5 

М = 200 23888.0 

N/2 = 100 
. УУХ 23,8880 _ 

(1) Mean - обо 119.44 


2 Median = 115.5 + $$ X 4 = 119.42 
3) Crude Mode falls on class-interval 119.5 to 123.5 or at 121.50 


queney distribution, the M is found in exactly the same way in which 


we compute the “average” reaction time to light of 100 different 
observers. 
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(3) THE MEAN FROM COMBINED SAMPLES OR GROUPS 
Suppose that on a certain test the mean for a group of 10 children 
is 62, and that on the same test the mean for a group of 40 children is 
62 X 10 + 66 X 40 
50 


or 65.2. The formula for the weighted mean of n groups is 


м = NM М4... NM, 
comb —7 


66. Then the mean of the two groups combined is 


(3) 


(weighted arithmetical mean obtained from combining 
n groups) 
When only two groups have been combined, the weighted mean is 
Mab - М.М; -- ММ» 
Nı +N: 


2. The median (Mdn) * 


(1) CALCULATION OF THE MEDIAN WHEN DATA ARE UNGROUPED 

When ungrouped scores or other measures are arranged in order of 
size, the median is the midpoint in the series. Two situations arise in 
the computation of the median from ungrouped data: (a) when N is 
odd, and (b) when N is even. To consider, first, the case where N is 
odd, suppose we have the following integral “mental ages”: 7, 10, 8, 
12, 9, 11, 7, calculated from seven performance tests. If we arrange 
these seven scores in order of size 


7 7 8 (9) 10 11 12 


the median is 9.0 since 9.0 is the midpoint of that score which lies 
midway in the series. Caleulation is as follows: There are three 
scores above, and three below 9, and since a score of 9 covers the 
interval 8.5 to 9.5, its midpoint is 9.0. This is the median. 

Now if we drop the first score of 7 our series contains six scores 


9.5 
7 8 9 T 10 11 12 


and the median is 9.5. Counting three scores in from the beginning of 
the series, we complete score 9 (which is 8.5 to 9.5) to reach 9.5, the 
"upper limit of score 9. In like manner, counting three scores in from 
the end of the series, we move through score 10 (10.5 to 9.5) reaching 
9.5, the lower limit of score 10. 

* The median is also designated as Md. 
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А formula for finding the median of a series of ungrouped scores is 
Median — the ohn measure in order of size (4) 
(median from ungrouped data) 


In our first illustration above, the median is on the gin or 4th 


score counting in from either end of the series, that is, 9.0 (midpoint 
MS 1 
8.5 to 9.5). In our second illustration, the median is on the Sun 


or 3.5th score in order of size, that is, 9.5 (upper limit of score 9, or 
lower limit of score 10). 


(2) CALCULATION OF THE MEDIAN WHEN DATA ARE GROUPED INTO А 
FREQUENCY DISTRIBUTION 

When scores in a continuous series are grouped into a frequency 
distribution, the median by definition is the 5076 point in the dis- 
tribution. To locate the median, therefore, we take 50% (i.e., N/2) 
0f our scores, and count into the distribution until the 50% point is 
reached. The method is illustrated in the two examples in Table 5. 
Since there are 50 scores in the first distribution, N, /2 — 25, and the 
median is that point in our distribution of Army Alpha scores which 
has 25 scores on each side of it. Beginning at the small-score end of 
the distribution, and adding up the scores in order, we find that 
intervals 140-144 to 165-169, inclusive, contain just 20 /’s—five 
scores short of the 25 necessary to locate the median. The next 
interval, 170-174, contains 10 Scores assumed to be spread evenly 
over the interval (p. 7). In order.to get the five extra scores needed 
to make exactly 25, we take 5/10 X5 (the length of the interval) 
and add this increment (2.5) to 169.5, the beginning of the interval 
170-174. This puts the Мал at 169.5 + 2.5 or at 172.0. The student 
should note carefully that the median like the mean is a point and 
not a score. 

A second illustration of the calculation of the median from data 
grouped into a frequency distribution is given in Table 5 (2). There 
are 200 scores in this distribution; hence, №/9 = 100, and the median 


tion. If we begin at the small-score end of the distribution (103.5 to 
107.5) and add the scores in order, 52 scores take us through the 
interval 111.5 to 115.5. The 49 Scores on the next interval (115.5 to 
119.5) plus the 52 already counted off total 101—one score too many 
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to give us 100, the point at which the median falls. To get the 
scores needed to make ezactly 100 we must take 48/49 X 4 (the 
length of the interval) and add this amount (3.92) to 115.5, the 
beginning of interval 115.5 to 119.5. This procedure takes us 
exactly 100 scores into the distribution, and locates the median at 
119.42. 

A formula for calculating the Mdn when the data have been classi- 
fied into a frequency distribution is 


NLF 
Mdn=1+ j i (5) 


(median computed from data grouped into a frequency distribution) 


where 
1 = lower limit of the elass-interval upon which the median lies 


ЕЯ 


= one-half the total number of scores 


Р = sum of the scores on all intervals below 1 
fm = frequency (number of scores) within the interval upon which 
the median falls 
i = length of the class-interval 


To illustrate the use of formula (5), consider the first example in 
Table 5, page 30. Here l= 169.5, М/2- 25, Е = 20, fm = 10, ала 


i= 5. Hence, the median falls at 169.5 + by X 5 or at 172.0. 
In the second example, | = 115.5, У/2 = 100, Е = 52, fm = 49. and 
— 52 

i=4. The median, therefore, is 115.5 + 009 589 x4 or 119.42. 


The steps involved in computing the Mdn from data tabulated 
into a frequency distribution may be summarized as follows: 


(1) Find N/2, that is, one-half of the cases in the distribution. 

(2) Begin at the small-score end of the distribution and count off the 
scores in order up to the lower limit (l) of the interval which 
contains the median. The sum of these scores is F. 

(3) Compute the number of scores necessary to fill out N/2, i.e., 
compute N/2 — F. Divide this quantity by the frequency (fm) 
on the interval which contains the median; and multiply the 
result by the size of the class-interval (2). 

(4) Add the amount obtained by the calculations in (3) to the lower 
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limit (1) of the interval which contains the Mdn. This will give 
the median of the distribution. 


The median may also be computed by adding up one-half of the 
Scores from the top down in a frequency distribution. The procedure 
is the same through step (3) in the summary above. When we count 
down from the top of the distribution, however, the quantity found 
in step (3) must be subtracted from the upper limit of the interval 
containing the median. To illustrate with the data of Table 5 (1), 
page 30, counting down in the f-column, 20 scores complete interval 
175-179, and we reach 174.5, the upper limit of the interval 170-174. 
Five scores of the 10 on this interval are needed to make 25 (N72). 
Hence we have 174.5 — ї X 5 = 172.0, which checks our first cal- 
culation of the median. In Table 5 (2), the median found by count- 
ing down is 119.5 — J X4 or 119.42. 


(8) CALCULATION OF THE Мап wuen (a) tHe FREQUENCY DISTRIBU- 
TION CONTAINS GAPS; AND WHEN (b) THE FIRST OR LAST INTERVAL 
HAS INDETERMINATE LIMITS 

(a) Difficulty arises when it becomes necessary to calculate the 

median from a distribution in which there are gaps or zero frequency 

upon one or more intervals. The method to be followed in such cases 
is shown in Table 6 below. Since N = 10, and N /2 = 5, we count up 


TABLE 6 Computation of the median when there are gaps in the dis- 


tribution 
oS eee se 
Class-Intervals 
Scores 7 
20-21 2 
18-19 1 
16-17 0 
e : 
2 
nii à) 10-13 
0 
6-7 2) 69 
4-5 1 
2-3 1 
0-1 1 
N =10 
= 5 


N/2- 
Mdn = 9.5 +$ X 2 = 9.5 


© 


the frequency column five scores through 6-7, Ordinarily, 


J this woul 
put the median at 7.5, the lower limit of interval 8-9. Т 


f we chee 
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this median, however, by counting down the frequency column five 
scores, the median falls at 11.5, the lower limit of 12-13. Obviously, 
the discrepancy between these two values of the median is due to the 
two intervals 8-9 and 10-11 (each of which has zero frequency) 
which lie between 6-7 and 12-13. In order to have the median come 
out at the same point, whether computed from the top or the bottom 
Qf the frequency distribution, the procedure usually followed in cases 
like this is to have interval 6-7 include 8-9, thus becoming 6-9; and 
to have interval 12-13 include 10-11, becoming 10-13. Lengthening 
these intervals from two to four units eliminates the zero frequency 
on the adjacent intervals by spreading the numerical frequency over 
them. If now we count off five scores, going up the frequency column 
through 6-9, the median falls at 9.5, the upper limit of this interval. 
Also, counting down the frequency column five scores, we arrive at a 
median value of 9.5, the upper limit of 6-9, or the lower limit of 
10-13. Computation from the two ends of the series now gives con- 
sistent results—the median is 9.5 in both instances. 

(b) When scores scatter widely, the last class-interval in a fre- 
quency distribution may be designated as “80 and above” or simply 
as 80+. This means that all scores above 80 are thrown into this 
interval, the upper limit of which is indeterminate. The same lump- 
ing together of scores may also occur at the beginning of the distribu- 
tion, when the first interval, for example, is designated “20 and 
below” or 20—. The lower limit of the beginning class-interval is 
now indeterminate. In irregular distributions like these, the median 
is readily computed since each score is simply counted as one fre- 
quency whether accurately classified or not. But it is impossible to 
calculate the mean exactly when the midpoint of one or more іп- 
tervals is unknown. The mean depends upon the absolute size of the 
scores (or their midpoints) and is directly affected by indeterminate 
interval limits, 


3. The mode 


Tn a simple ungrouped series of measures the “crude” or “em- 
pirical” mode is that single measure or score which occurs most fre- 
quently. For example, in the series 10, 11, 11, 12, 12, 13, 18, 13, 14, 14, 
the most often recurring measure, namely 13, is the crude or em- 
pirical mode. When data are grouped into a frequency distribution, 
the crude mode is usually taken to be the midpoint of that interval 

“which contains the largest frequency. In example 1, Table 5, page 30, 
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the interval 170-174 contains the largest frequency and hence 172.0, 
its midpoint, is the crude mode. In example 2, Table 5, the largest 
frequency falls on 119.5 to 123.5 and the crude mode is at 121.5, the 
midpoint. 

When calculating the mode from a frequency distribution, we dis- 
tinguish between the “true” mode and the crude mode. The true 
mode is the point (or “peak”) of greatest concentration in the dis- 
tribution; that is, the point at which more measures fall than at any 
other point. When the scale is divided into finely graduated units, 
when scores are recorded exactly, and when № is large, the crude 
mode closely approaches the true mode. Ordinarily, however, the 
crude mode is only approximately equal to the true mode. A formula 
for approximating the true mode, when the frequency distribution is 
symmetrical, or at least not badly skewed (page 97) is 


Mode = 3 Мап — 2 Mean (6) 
(approximation to the true mode calculated from a frequency dis- 
tribution) 


If we apply this formula to the data in Table 5, the mode is 174.40 
for the first distribution, and 119.38 for the second. The first mode 
is somewhat larger and the second slightly smaller than the crude 
modes obtained from the same distributions. 

The crude mode is often an unstable measure of central tendency. 
This instability is not, however, so serious a drawback as might seem 
at first glance. The crude mode is usually employed as a simple, 
inspectional “average,” to indicate in a rough way the center of 
concentration in the distribution., For this purpose it need not be 
calculated as exactly as the median or mean. 


ll. Calculation of the Mean by the "Assumed Mean” 
or Short Method 


In Table 5, page 30, the mean was calculated by multiplying the 
midpoint (X) of each interval by the frequency (number of scores) 
on the interval, summing up these values (the /Х column) and divid- 

: ing by N, the number of scores. This straightforward method (called 
the Long Method) gives accurate results but often requires the 
handling of large numbers and entails tedious calculation. Because 
of this, the “Assumed Mean” method, or simply the Short Method, 
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has been devised for computing the mean. The Short Method does 
not apply to the caleulation of the median or the mode. These meas- 
ures are always found by the methods previously described. 

The most important fact to remember in calculating the mean by 
the Short Method is that we “guess” or “assume” a mean at the 
outset, and later apply a correction to this assumed value (АМ) in 
order to obtain the actual mean (M) (see Table 7, below). There 


TABLE 7 The calculation of the mean by the short method 
(Data from Table 1, 50 Army Alpha scores) 


a) (2) (8) (4) (5) 
катан БОШ 7 z p 
195-199 197 1 5 5 
190-194 192 2 4 8 
185-189 187 4 3 12 
180-184 182 5 2 10 
175-179 177 8 1 8 
170-174 172 10 0 + 43 
165-169 167 6 -1 = 6 
160-164 162 4 -? - 8 
155-159 157 4 -3 - 12 
150-154 152 2 EE = 8 
145-149 147 3 —5 — 15 
140-144 142 Hi -6 - 6 
N-50 - 55 
АМ - 172.00 c= — $ = — .240 
сі = — 1.20 ї= 5 
М = 170, сі = — .240 X 5 = — 1.20 


is по set rule for assuming а mean.* The best plan is to take the 
midpoint of an interval somewhere near the center of the distribu- 
tion; and if possible the midpoint of that interval which contains the 
largest frequency. In Table 7, the largest f is on interval 170-174, 
which also happens to be almost in the center of the distribution. 
Hence the AM is taken at 172.0, the middle of this interval. When 
the question of the AM is settled, we determine the correction which 
must be applied to the AM in order to get M. Steps are as follows: 


(1) First, we fill in the г column,} column (4). Here are entered the 
deviations of the midpoints of the different steps measured from 


*The method outlined here gives consistent results no matter where the 


mean is tentatively placed or assumed. |. 
ta’ is regularl SEE] to denote the deviation of a score X from the assumed 


mean (АМ); т is the deviation of a score X from the actual mean (M) of the 
distribution. 
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the AM in units of class-interval. Thus 177, the midpoint of 
175-179, deviates from 172, the AM, by one interval; and a “1” 
is placed in the 2’ column opposite 177. In like manner, 182 
deviates two intervals from 172; and a “2” goes in the 2” column 
opposite 182. Reading on up the 2 column from 172, we find the 
succeeding entries to be 3, 4, and 5. The last entry, 5, is the 
interval-deviation of 197 from 172; the actual score-deviation, of 
course, is 25. 

Returning to 172, we find that the т of this midpoint meas- 
ured from the AM (from itself) is zero; hence a zero is placed 
in the z' column opposite 170-174. Below 172, all of the 2’ 
entries are negative, since all of the midpoints are less than 172, 
the AM. So the 2’ of 167 from 172 is —1 interval; and the 2’ of 
162 from 172 is —2 intervals. The other 278 are —3, —4, —5, 
and —6 intervals, 

The z' column completed, we compute the fz’ column, column 
(5) The fz’ entries are found in exactly the same way as are the 
ÍX in Table 5, page 30. Each 2’ in column (4) is multiplied or 
“weighted” by the appropriate f in column (3). Note again that 
in the Short Method we multiply each 2’ by its deviation from 
the AM in units of class-interval, instead of by its actual devia- 
tion from the mean of the distribution. For this reason, the 
computation of the fz’ column is much more simple than is the 
calculation of the fX column by the method given on page 29. 
All of the fz’ on intervals above (greater than) the AM are posi- 
tive; and all fz’ on intervals below (smaller than) the AM are 
negative, since the signs of the fx’ depend upon the signs of 
the 27, ы 

From the fz’ column the correction i$ obtained as follows: The 
sum of the positive values in the fz’ column is 43; and the sum 
of the negative values in the fe’ column is —55. There are, there- 
fore, 12 more minus fx’ values than plus (the algebraic sum is 
—12); and —12 divided by 50 (N) gives — 240 which is the 
correction (c) in units of class-interval. ТЇ we multiply c 
(—.240) by i, the length of the interval (here 5), the result is ci 
(—1.20) the score correction, or the correction in score units. 
When —1.20 is added to 172.00, the AM, the result is the actual 
mean, 170.80. 


(2 


(3 


= 


The process of calculating the mean by the Short Method may be 
summarized as follows: 
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(1) Tabulate the scores or measures into a frequency distribution. 
(2) "Assume" a mean as near the center of the distribution as pos- 


IIl. 


sible, and preferably on the interval containing the largest 
frequency. 

Find the deviation of the midpoint of each class-interval from 
the AM in units of interval. 

Multiply or weight each deviation (27) by its appropriate )- 
the f opposite it. 

Find the algebraic sum of the plus and minus fz’ and divide this 
sum by М, the number of cases. This gives c, the correction in 
units of class-interval. 

Multiply с by the interval length (i) to get ci, the score correc- 
tion. 

Add ci algebraically to the AM to get the actual mean. Some 
times ci will be positive and sometimes negative, depending 
upon where the mean has been assumed. The method works 
equally well in either case. 


When To Use the Various Measures of Central Tendency 


The beginning student of statistics is often puzzled to know which 
measure of central tendency to use in a given problem. The follow- 
ing will serve as a convenient summary. 


1. 


2. 


Use {һе mean 

(1) When each score or measure should have equal weight in 
determining the central tendency. Since the mean is the sum 
of the scores divided by their number, each score has equal 
weight in its determination. 

(2) When the measure of central tendency having the highest 
reliability is desired (p. 194). 

(3) When standard deviations and product-moment coefficients 
of correlation are to be subsequently computed (p. 138). 


Use the median 

(1) When a quick and easily computed measure of central 
tendeney is wanted. $ 

(2) When there are extreme measures which would affect the 
mean disproportionately (p. 34). е 

(3) When it is desired that certain scores should influence the 
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central tendency but all that is known about them is that 
they are above or below the median (p. 35). i 

3. Use the mode 


(1) When the most often recurring or "popular" score is sought. 
(2) When a quick approximate measure of concentration is all 
that is wanted. 


PROBLEMS 


1. Caleulate the mean, median, and mode for the following frequency dis- 
tributions. Use the Short Method in computing the mean. 


(1) Scores f (2) Scores Í 
70-71 2 90-94 2 
68-69 2 85-89 2 
66-67 3 80-84 4 
64-65 4 75-79 8 
62-63 6 70-74 6 
60-61 7 65-69 11 
58—59 5 60-64 9 
56-57 4 55-59 .7 
54-55 2 50-54 5 
52-53 3 45-49 0 
50-51 1 40-44 2 

N=39 N=56 

(3) Scores Í (4) Scores Í 
120-122 2 100-109 5 
117-119 7 » 90-99 9 
114-116 2 80-89 14 
111-113 4 70-79 19 
108-110 5 60-69 21 
105-107 9 50-59 80 
102-104 6 40-49 25 
99-101 3 30-39 15 
96-98 4 20-29 10 
93-95 2 10-19 8 
90-92 il 0-9 6 


| 
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(5) Scores if (6) Í 
120-139 50 15 T 
100-119 150 14 2 

80-99 500 13 3 
60-79 250 12 6 
40-59 50 11 12 
N = 1000 10 15 

9 22 

8 81 

7 18 

6 6 

5 2 

4 2 

N=120 


2. Compute the mean and the median for each of the two distributions in 
problem 5(a), page 24, tabulated in 3- and 5-unit intervals. Compare 
the two means and the two medians, and explain any discrepancy found. 
(Let the first interval in the first distribution be 61-63; the first interval 
in the second distribution, 60-64.) 


8. (а) The same test is given to the three sections of Grade VI. Results are: 
Section I, M = 24, N = 32; Section II, M = 31, N = 54; Section 
III, М = 35, N = 16. What is the general mean for the grade? 
(b) The mean score on AGCT in Camp A is 102, N = 1500; and in 
Camp B 106, N = 450. What is the mean for Camps A and B 
combined? 


4, (a) Compute the median of the following 16 scores by the method of 
p. 84. 
Scores E 
20 up to 22 
18 up to 20 
16 up to 18 
14 up to 16 
12 up to 14 
10 up to 12 
8 up to 10 
6up to8 
4up to6 
2 up to 4 
0 up to 2 
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(b) In a group of 50 children, the 8 children who took longer than 5 
minutes to complete a performance test were marked D.N.C. (did 
not complete). In computing a measure of central tendency for this 
distribution of scores, what measure would you use, and why? 

(c) Find the medians of the following arrays of ungrouped scores by 
formula (4) p. 32: 

(1) 21, 24, 27, 29, 29, 30, 32, 33, 35, 38, 42, 45. 
(2) 54, 59, 64, 67, 70, 72, 73, 75, 78, 83, 90. 
(3) 7,8,9,9, 10, 11. 


5. The time by your watch is 10:31 o'clock, In checking with two friends, 
you find that their watches give the time as 10:25 and 10:34. Assuming 
that the three watches are equally good timepieces, what do you think 
is probably the “correct time”? 

6. What is meant popularly by the “law of averages”? 

7. (a) When one uses the term “in the mode” does he have reference to the 

mode of a distribution? 
(b) What is approximately the modal time for each of the following 
meals: breakfast, lunch, dinner. Explain your answers. 
(c) Why is the median usually the best measure of the typical contribu- 
tion in a church collection? 
ANSWERS 
1. (1) Mean = 60.76 (2) Mean = 6736 
Median = 60.79 Median = 66.77 
Mode = 60.85 Mode — 65.59 
(3) Mean =106.00. . (4) Mean = 55.43 
Median = 10583. Median = 55.17 
Mode = 10549 | Mode — 54.65 
(5) Mean = 87.5 (6) Меяп = 885 
Мейап =875 Median = 855 
Mode = 87.5 Mode -- 795 
2. Class-interval = 3 Class-interval — 5 
Mean — 72.92 Mean — 73.00 
Median — 71.75 Median — 72.71 
3. (a) 29.43 (b) 103 (to the nearest whole number) 
4. (a) Median — 11.5 


(c) (1) Median — 31.0 
(2) Median — 72.0 
(3) Median = 90 


Mean is 10:30. 


—(—————— 
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MEASURES OF VARIABILITY 


+ 


In Chapter 2 the calculation of three measures of central tendency 
—measures typical or representative of a set of scores as a whole— 
was described. Ordinarily, the next step is to find some measure of 
the variability of our scores, i.e., of the “scatter” or “spread” of the 
separate scores or measures around their central tendency. It will be 
the task of this chapter to show how measures of variability may be 
computed. 

The usefulness of a measure of variability can be seen from a 
simple example. Suppose a test of controlled association has been 
administered to a group of 50 boys and to a group of 50 girls. The 
mean scores are, boys, 34.6 seconds, and girls, 34.5 seconds. So far 
as the means go there is no difference in the performance of the two 
groups. But suppose the boys’ scores are found to range from 15 to 
51 seconds and the girls’ scores from 19 to 45 seconds. This difference 
in range shows that in a general way the boys “cover more territory,” 
are more variable, than the girls; and this greater variability may be 
of more interest than the lack of a difference in the means. If a group 
is homogeneous, that is, made up of individuals of nearly the same 
ability, most of the scores will fall around the same point on the 
scale, the range will be relatively short, and the variability will be 
small, But if the group contains individuals of widely differing 
capacities, scores will be strung out from high to low, the range will 
be relatively wide, and the variability large. 

This situation is represented graphically in Figure 7, which shows 
two frequency distributions of the same area (N) and same mean 
(50) but of very different variability. Group A ranges from 20 to 80, 
and Group B from 40 to 60. Group A is three times as variable as 
Group B—spreads over three times the distance on the scale of scores 
—though both distributions have the same central tendency. 


в 
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20 30 40 50 60 70 80 


FIG. 7 Two distributions of the same area (N) and mean (50) but of 
very different variability 


Four measures have been devised to indicate the variability or 
dispersion within a set of measures. These are (1) the range, (2) the 
quartile deviation or Q, (3) the mean deviation or MD, and (4) the 
standard deviation or SD. 


I. Calculation of Measures of Variability 


1. The range 


In grouping the scores in Table 1 into а frequency distribution 
(p. 5) we have already had occasion to use the range. It may be 
redefined simply as the interval between the largest and the smallest 
scores. In the illustration above, the range of boys’ scores was 51-15 
or 36 seconds and the range of girls’ scores 45-19 or 26 seconds, The 
range is the most general measure of Spread or scatter, and is com- 
puted when we wish to make a rough comparison of two or more 
groups for variability. Since the range takes account of the extremes 
of the series only it is unreliable when N is small or when many or 
large gaps (i.e. zero f's) occur in the frequency distribution. 


2. The quartile deviation or О 


The quartile deviation or Q is one-half the distance between the 
75th and 25th percentiles in a frequency distribution. The 25th per- 
centile or Q; is the first quartile on the score-scale, the point below 
which lie 25% of the scores. The 75th percentile or Qs is the third 
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quartile on the score-scale, the point below which lie 75% of the 
scores.* 

To find Q, we must first caleulate the 75th and 25th percentiles. 
These values are found by exactly the same method employed іп cal- 
culating the median. To find 0), count off 25% of the scores from the 
beginning of the distribution (low end) ; and to find Qs count off 7596 
of the scores from the low end of the distribution, or 2596 from the 
high end. 

"Table 8 illustrates the caleulation of Q for the distribution of fifty 
Alpha scores tabulated in Table 1. First, to find 01, count off 1/4 
of N (12.5) from the low-score end of the distribution. When 
the scores (f) are added in order, the first four class-intervals (140- 
144 to 155—159, inclusive) are found to contain 10 scores. The next 
interval, 160-164, contains four scores, assumed to be spread evenly 
over the interval. Since we need only 2.5 additional scores to make 
up the necessary 12.5, take 2.5/4 < 5 (the interval) and add this 
amount, viz., 3.13, to 159.50, the beginning of the interval which con- 
tains Qı. This calculation locates Q, at 162.63 (see Table 8). 

Qs is found in the same way by counting off 3/4 of N (37.5) from 
the small-score end of the distribution. The f’s on 140-144 to 170- 
174, inclusive, added in order, total 30. The next interval, 175-179, 
contains 8 scores. To make up the necessary 37.5, therefore, take 
7.5/8 Х 5 (interval) and add this amount (viz., 4.69) to 174.50. This 
puts Qs at 179.19 (see Table 8). 


TABLE 8 The calculation of the Q, MD and SD from data grouped into 
a frequency distribution 


x 
1. Data from Table 1, page 5, 50 Army Alpha scores 


(1) (2) (3) (4) (5) (6) 

Cas ares Midpoint 7 = fe fe 
195-199 197 1 26.20 26.20 686.44 

190-194 192 2 21.20 42.40 898. 
185-189 187 4 16.20 64.80 1049.76 
180—184 182 5 11.20 56.00 627.20 
175-179 177 8 6.20 49.60 307.52 
170-174 172 10 80 1.20 12.00 14.40 

165-169 167 6 - 3.80 — 22.80 86. 
160-164 162 4 - 8.80 — 35.20 309.76 
155-159 157 4 10 — 13.80 — 55.20 761.76 

150-154 152 2 - 18.80 — 37.60 706. 
145-149 147 3 - 23.80 — 71.40 1699.32 

140-144 142 1 — 2880  — 28.80 829. 
М = 50 502.00 7978.00 


* It may be noted that the second quartile, ©з, is the median. 
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TABLE 8—{Continued) 
Mean = 170.80 (Table 5, p. 30) 


N 3N 
J = 125 and BN e ans 
Q = 1505 +25 x 5 = 16268 Q= 1745 +73 x 5 = 179.19 


q= OO _ mao 102.68 L gag 


Х/!| 502.00 
MD Ере 10.04 


= Pi _ , [7978.00 _ 
SD үз "ы 71268 


2. Data from Table 3, p. 13, 200 cancellation scores 
(1) (2) (8) (4) (5) (6) 
Class-Intervals CES 
Scores 


f z Jz ж 
135.5 to 139.5 1375 3 18.06 5418 97840 
181.5 to 135.5 133,5 5 14.06 70.30 988.42 
127.5 to 131.5 129.5 16 10.06 160.96 1619.26 
123.5 to 127.5 125.5 23 6.06 139.38 844.64 
119.5 to 123.5 121.5 52 2.06 107.12 220.67 
115.5 to 119.5 117.5 49 11 -1М — 9506 184.42 
111.5 to 1155 1135 27 - 5.94 — 160.38 952.66 
107.5 to 111.5 109.5 18 25 -9М - 17802 177846 
1035105 1055 7 —13.94 — 97.58 1360.27 
N = 200 1063.88 8927.29 
Mean - 11944 (Table 5) 
7 = 50 and aN = 150 
Q = 111.5 + $} X 4 = 11520 Qs = 119.5 + $$ X 4 = 123.27 
Qu 9-0. 123-7. 11520 2. 
2 ҰЙЫҚ 24179 
= 2lie| _ 1063.88 
MD “Ne 7-3 7582 


JR 8927.29 
SD y= 200 = 6.08 


When @ and Qs are known, Q, the quartile deviation, is found 
from the formula 


9-9 (7) 


(quartile deviation calculated from grouped data) 


In the present problem, @ = 179.19 — 162.63 or 8.28. 
2 
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А second illustration of the calculation of Q from a frequency 
distribution is given in Table 8, example 2. Since the N of this dis- 
tribution is 200, 1/4 of N equals 50. The intervals 103.5 to 107.5 and 
107.5 to 111.5 contain 25 scores; and the next interval, 111.5 to 115.5, 
contains 27 scores, which makes a total of 52—two more than the 50 
wanted. То find the point reached by just 50 scores, take 25/27 X 4 
(the interval) and add this amount (3.70) to 111.50, the lower limit 
of 111.5 to 115.5. This locates 01 at 115.20. 

To find Qs count off 3/4 of N or 150 веогев from the small-score 
end of the distribution. The first four intervals include 101 scores, 
and the next interval, 119.5 to 123.5, contains 52 scores. To fill out 
the required 150, take 49/52 X 4, the length of the interval, and add 
this increment (3.77) to 119.50, to locate Qs at 123.27. Substituting 
115.20 for Q, and 123.27 for Оз in formula (7) we get a Q of 4.04, 

The quartiles ©, and Qs mark off the limits of the middle 50% of 
Scores in the distribution and the distance between these points is 
called the interquartile range. Q is one-half the range of the middle 
50% or the semi-interquartile range. Since Q measures the average 
distance of the quartile points from the median, it is a good measure 
of score density around the middle of the distribytion. If the scores 
of a distribution are packed closely together the quartiles will be 
near to one another and Q will be small; if the scores are widely 
Scattered, the quartiles will be relatively far apart, and Q will be 
large (see Fig. 7, p. 44). 

When the distribution is asymmetrical or “skewed,” Qı and Оз 
are at unequal distances from the median, and the difference between 
(Qs — Мап) and (Mdn — Qi) gives a measure of the amount and 
direetion of the skewness (p. 98).' When the distribution is sym- 
metrical or normal, © marks off exactly the 25% of cases just above, 
and the 25% of cases just below, the median. The median then lies 
just halfway between the two quartiles Q, and Qs. In a normal 
distribution Q becomes the PE (probable error), The terms Q and 
PE are often used interchangeably, but it is best to restrict the use 
of the term PE to the normal probability curve (p. 97). 

Steps in calculating Q may be summarized as follows: 


To find Q, 
(1) Divide У by 4. 
(2) Begin at the low-score end of the distribution, and count off the 
Scores up to the interval which contains Q,. ; 
(3) Divide the number of scores necessary to locate Q, (i.e., to complete 
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N/4) by the frequency in the interval reached in (2) above, and 
multiply the result by the class-interval. 

(4) Add the amount obtained in (3) to the lower limit of the class- 
interval within which Q; lies. This gives 01. 


To find Qs 


(1) Find 3/4 of N. 

(2) Begin at the low-score * end of the distribution, and count up the 
Scores until the interval which contains Q, is reached. 

(8) Divide the number of scores required to locate Qs by the frequency 
within the interval reached in (2) and multiply the result by the 
class-interval. 

(4) Add the amount obtained in (3) to the lower limit of the class- 
interval within which Qg lies. This gives Qs. 


To find Q 
Substitute Q and О, in formula (7). 


3. The Mean Deviation or MD 


(1) CALCULATION or MD FROM UNGROUPED DATA 

The mean deviation or MD (also written average deviation or AD 
and mean variation or MV) is the mean of the deviations of all the 
separate measures in a series taken from their central tendency (usu- 
ally the arithmetic mean; less frequently the median or mode). In 
averaging deviations to find the MD, no account is taken of signs, 
and all deviations whether positive or negative are treated as 
positive, 

An example will make our definition clearer. If we have five 
scores, 6, 8, 10, 12, and 14, the mean is easily found to be 10. It is 
then a simple process to find the deviation of each measure from 
this mean by subtracting the mean from each measure. Thus 6, 
the first score, minus 10 equals —4; 8— 10 — —2; 10— 10— 0; 
12— 10 —2; and 14 — 10 2 4. Тһе five deviations measured from 
the mean are —4, —2, 0, 2, and 4. If we add these deviations without 
regard to signs the sum is 12; and dividing 12 by 5 (№), we get 2 4 as 
the mean of the five deviations from their mean, or the MD. The 
formula for the MD when scores are ungrouped may be written 


wo=2el | ® 


(mean deviation for ungrouped measures) 


* Qa may also be found by counting in 25% from the high-score end of the 
distribution. To avoid confusion, the method given above is recommended ta 
the beginner. 
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in which the X | x | denotes the sum of the deviations from the mean 
and N is, as before, the number of cases or items. The bars || enclos- 
ing Ут indicate that signs are disregarded. The small letter x in the 
formula always represents the deviation of a score X from its mean 
M,ie x= Х – М. 


(2) CALCULATION оғ MD FROM GROUPED DATA 

In Table 8 the calculation of the MD for scores grouped into a fre- 
quency distribution is illustrated by two problems. The mean of the 
fifty Army Alpha scores in problem 1 has already been found in 
Table 5, page 30, to be 170.80. To compute the MD of the scores in 
this distribution we must take our deviations (z's) around this mean. 
However, since the scores have been grouped into class-intervals, we 
are unable to get the deviation of each separate score from the mean. 
In lieu of separate score deviations, therefore, we take the deviation 
of the midpoint of each interval from the mean. The substitution of 
the midpoint for all of the scores within an interval is the only differ- 
ence between the computation of z's from grouped and from un- 
grouped data. Тһе т of 195-199, for example, is 26.20, found by sub- 
tracting 170.80 (the mean) from 197.00 (the midpoint of the inter- 
val). АП of the 2% are positive as far down as 170-174, as in each 
case the midpoint is numerically larger than the mean. From the in- 
terval 165-169 on down to the beginning of the series, the 278 are neg- 
ative, as the midpoints of these intervals are all smaller than 170.80. 
Thus the x of interval 165-169 is —3.80; and the т of the lowest 
interval in the distribution, 140-144, is —28.80. 

Tt will be helpful in calculating deviations from the mean to 
remember that the mean is always‘subtracted from the individual 
score or midpoint value. That is, т (deviation) = X (score or mid- 
point) — M(mean). The calculation is algebraic. When the score or 
midpoint is numerically larger than the mean the deviation is posi- 
tive; when the score or midpoint is numerically smaller than the 
mean the.deviation is negative. 

Column (4) Table 8, page 45, gives the deviation of each class- 
interval, as represented by its midpoint, from the mean of the dis- 
tribution. There are more scores on some intervals than on others; 
hence each midpoint deviation in column (4) must be “weighted” or 
multiplied by the number of scores (f) which it represents. This 
gives the fx column, column (5). The first fz is 26.20; for, since there 
is only one score on 195-199, we multiply the first т by 1. The next 
fx is 42.40, since each of the two scores on 190-194 has an x of 21.20. 
In the same way we obtain the other fz's by multiplying, in each 
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case, the z in column (4) by its corresponding f in column (3). When 
all of the fz's have been calculated, the column is added without 
regard to sign, and the resulting sum is divided by N to give the MD. 
Tn the present problem the M D equals 502.00/50 or 10.04. 

The formula for the MD when measures are grouped into a fre- 
quency distribution is as follows: 


=| fx | 
N 
(mean deviation for scores grouped into a frequency distribution) 


The second problem in Table 8 shows the calculation of the MD 
for 200 cancellation scores grouped into a frequency distribution in 
class-intervals of four. The mean of this distribution was found to be 
119.44 (Table 5, page 30). Hence, the т of the topmost interval, 
135.5 to 139.5 (midpoint 137.50), from the mean is 18.06. Since the 
class-interval is constant in size, the next x may be found by sub- 
tracting 4 (the interval) from 18.06; and each succeeding т may be 
found by subtracting 4 from the x just preceding it. 

The fz's in column (5) are found, as shown in problem 1, by 
weighting each x by the f which it represents—by the f opposite it. 
The sum of the fz column is 1063.88; and, since N is equal to 200, 
from formula (9) we obtain 5.32 as the MD of the scores in this dis- 
tribution around their mean of 119.44. 

In a symmetrical or normal distribution the MD, when measured 
off on the scale above and below the mean, marks the limits of the 
middle 57.596 of the measures. The MD is always slightly larger, 
therefore, than the Q which marks off the limits of the middle 50%. 
A large MD means that the scores of the distribution tend to scatter 
widely around the central tendency; a small МР that they tend to be 
concentrated within a relatively narrow range. 


MD= (9) 


4. The standard deviation or SD 


The standard deviation or SD is the measure of variability cus- 
tomarily employed in research. The SD differs from the MD in 
several respects. In calculating the MD we disregard signs and treat 
all deviations as positive; in finding the SD we avoid this difficulty 
of signs by squaring the separate deviations. Again, the squared . 
deviations used in computing the SD are always taken from the mean 
of the distribution, and never from the median or mode. The con- 
ventional symbol used to denote the SD is the Greek letter sigma (о). 
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(1) CALCULATION оғ SD FROM UNGROUPED DATA 

The standard deviation or o is the square root of the mean of the 
squared deviations taken from the arithmetical mean of the distribu- 
tion. To illustrate the calculation of the SD in a simple ungrouped 
series, let us consider the example given on page 48, to illustrate the 
calculation of the MD, in which the deviations of the five measures, 
6, 8, 10, 12, and 14 from their mean of 10 were found to be —4, —2, 
0, 2, and 4, respectively, Squaring each of these deviations, we 
obtain 16, 4, 0, 4, and 16. Summing these five squares and dividing 
by five, we obtain the mean of the squares, and, extracting the square 
root, get 2.83, the SD of this series. The formula for the SD or c 
when the series of scores is ungrouped is as follows: 


_ [Em 
от SUN (10) 


(standard deviation calculated from ungrouped data) 


- 

(2) CALCULATION оғ SD FROM GROUPED DATA 

Table 8 illustrates the calculation of c when scores are grouped 
into a frequency distribution. The process is identical with that 
used for ungrouped items, except that, in addition to squaring the x 
of each midpoint from the mean, we weight each of these squared 
deviations by the freqency which it represents—that is, by the fre- 
quency opposite it. This multiplication gives the fz? column. By 
simple algebra, x X fx = fz?; and accordingly the easiest way to 
obtain the entries in column fz? is to multiply the corresponding 278 
and fz's in columns (4) and (5). The first fx? entry, for example, is 
686.44, the product of 26.20 times 26.20; the second entry is 898.88, 
the product of 42.40 times 21.20; and so on to the end of the column. 
АП of the fz? are necessarily positive since each negative z is matched 
by a negative fr. The sum of the fx? column (7978.00) divided by N 
(50) gives the mean of the squared deviations as 159.56; and the 
square root of this result is 12.63, the SD. The formula for o when 
data are grouped into a frequency distribution is: 


2. B 
o= НЫСА (11) 
(SD or c for data grouped into a frequency distribution) 


Problem 2 of Table 8, page 46 furnishes another illustration of the 
calculation of o from grouped data. In column (6), the fx? entries 
have been obtained, as in the previous problem, by multiplying each 
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t by its corresponding fr. The sum of the fx? column is 8927.29; 
and N is 200. Hence, applying formula (11) we get 6.68 as the SD. 

The standard deviation is less affected by sampling errors (p. 194) 
than is the Q or the MD and is a more stable measure of dispersion. 
In a normal distribution the SD, when measured off above and below 
the mean, marks the limits of the middle 68.26% (roughly the middle 
two-thirds) of the distribution." This is approximately true also of 
the o in less symmetrical distributions. For example, in the first 
problem in Table 8 the middle 65% of the scores fall between score 
183 (170.80 -+ 12.63) and score 158 (170.80 — 12.63).+ The SD is 
larger than the MD which is, in turn, larger than Q. These relation- 
ships supply a rough check upon the accuracy of the measures of 
variability. 


ll. Calculation of the SD by the Short Method 


1. Calculation of c from grouped data 


On page 37, the Short Method of ealeulating the mean was out- 
lined. This method consisted essentially in "guessing" or assuming 


TABLE 9 The calculation of the SD by the short method.¢ Data from 
Table I. Calculations by the long method given for com- 


parison 
1, Short Method 

() (2) (8) (4) (5) (6) 
Scores Бароко f Мо “м fa? 
195-199 197 1 5 5 25 
190-194 192 2 4 8 32 
185-189 187 4 3 12 36 
180-184 182 5 2 10 20 
175-179 177 8 1 8 (+ 43) 8 

170-174 172 10 0 ен 
165-169 167 6 -1 -6 6 
160-164 162 4 cu. —8 16 
155-159 157 4 =3 -12 36 
150-154 152 2 —4 EAS 32 
*145-149 147 3 -5 - 15 75 
140-144 142 at -6 — 6 (— 55) .36 
Ni- 50 98 322 


* See page 96. J 

+See page 71 for method of calculating the percentage of scores falling 
between two points in a frequency distribution. 

t The calculation of the mean is repeated from Table 7. 
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TABLE 9—{Continued) 


LAM = 1200 c=- $% =—240  ci2—240x5— —120 


с? = 0576 
ci = =120 
M = 170.80 
IP 
2.8D =i LIST ean cum 
N 50 
= 12.63 
2. Long Method 
(1) (2) 8) @ (5) (6) @) 

Scores Маре 7 X А Ж je 
195-199 197 al 197 26.20 26.20 686.44 
190-194 192 2 384 21.20 42.40 898.88 
185-189 187 4 748 16.20 64.80 1049.76 
180-184 182 5 910 11.20 56.00 627.20 
175-179 177 8 1416 6.20 49.60 307.52 
170-174 172 10 1720 1.20 12.00 14.40 
165-169 167 6 1002 — 8.80 — 22.80 86.64 
160-164 162 4 648 — 8.80 — 35.20 309.76 
155-159 157 4 628 — 13.80 — 55.20 761.76 
150-154 152 2 304 — 18.80 — 37.60 706.88 
145-149 147 3 441 — 23.80 — 71.40 1699.32 
140-144 142 EL 142 — 28.80 — 28.80 829.44 

У-50 8540 502.00 7978.00 

EfX _ 8540 _ 

1. M= TN = g = 170.80 


а mean, and later applying to this value a correction to give the 
actual mean. The Short Method may also be used to advantage in 
calculating the SD.* It is a decided time and labor saver in dealing 
with grouped data; and is well-nigh indispensable in the calculation 
of o’s in a correlation table (p. 134). 

The Short Method of calculating the SD is illustrated in Table 9. 
Тһе computation of the mean is repeated in the table, as is also the 
caleulation of the mean and SD by the direct or Long Method. This 
procedure affords a readier comparison of the two techniques. 


“Тһе MD may also be calculated by the assumed mean or Short Method. 
The MD is во rarely used, however, that the Short Method of calculation 
(which is neither very short nor very satisfactory) is not given. 
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The formula for computing с by the Short Method is 


sci EE а (12) 


(SD from a frequency distribution when deviations are taken from 
an assumed mean) 


in which Efz^ is the sum of the squared deviations in units of class- 
interval, taken from the assumed mean, c? is the squared correction 
in units of class-interval, and 2 is the class-interval. 

The calculation of c by the Short Method may be followed in detail 
from Table 9. Deviations are taken from the assumed mean (172.0) 
in units of class-interval and entered in column (4) as z'. In col- 
umn (5) each 2’ is weighted or multiplied by its f to give the fz’; and 
in column (6) the fx’’s are found by multiplying each z’ in column 
(4) by the corresponding fz’ in column (5). The process is identical 
with that used in the Long Method except that the 278 are all ex- 
pressed in units of class-interval. This considerably simplifies the 
multiplication. The calculation of c has already been described on 
page 38: c is the algebraic sum of column (5) divided by N. The 
sum of the fx’ column is 322, and c? is 0576. Applying formula (12) 
we get 2.525 Х 5 (interval) or 12.63 as the c of the distribution. 
Formula (12) for the calculation of c by the Short Method holds 
good no matter what the size of с, the correction in units of class- 
interval, or where the mean has been assumed. 


5 
2. Calculation of c from the original measures or scores 


Tt will often save time and labor to apply the Short Method for 
computing o directly to the ungrouped scores. The method is illus- 
trated in Table 10. ‘Note that the ten scores are ungrouped, and that 
it is not necessary even to arrange them in order of size. The assumed 
mean is taken at zero, and each score becomes at once a deviation 
(z’) from this AM, that is, each score (X) is unchanged. The correc- 
tion, с, is the difference between the actual mean (M) and the as- 
sumed mean (0), i.e., c = М — 0; hence c is simply M itself. The 
mean is calculated, as before, by summing the scores and dividing 
by N (see page 28). То find c, we square the z^s (or the X's which 
are the scores), sum them to get X (a)? or =X?, divide by N, and 
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TABLE 10 To illustrate the calculation of the SD from original scores 
when the assumed mean is taken at zero, and data are un- 


grouped 
Scores (X) z' (or X) (27)? or (X3) 
18 18 324 
25 25 625 
21 21 441 
19 19 361 
27 27 729 
31 31 961 
22 22 484 
25 25 625 
28 28 784 
„2 _20 _400 
236 236 5784 
АМ =0 
М = 3$ = 23.6 N=10 
с= 236—0 
= 23.6 
c = 556.96 
g = УА — (23.6)? X 1 (interval) 
= У1644 
z405 


subtract M?, the correction squared. The square root of the result 
gives c. А convenient formula is 


gu Ma (13) 


2 
or replacing the M? by 5-4 "m 


_ VNSEX?= (SX)? as 


m N 
(c calculated from original scores by the Short Method) 


This method of calculating c is especially useful when there are 
relatively few scores, say fifty or less, and when the scores are ex- 
pressed in not more than two digits,* so that the squares do not 
become unwieldy. A calculating machine and a table of squares will 
greatly facilitate computation. Simply sum the scores as they stand 
and divide by N to get M. Then enter the squares of the scores in 

* Ког the application of this method to the calculation of coefficients of cor- 


relation, and a scheme for reducing the size of the original scores so as to 
eliminate the need for handling large numbers, see page 143. 
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the machine in order, sum, and substitute the result in formula (13) 
or formula (14). 


3. Effect upon с of (a) adding a constant to each score, or (5) multiplying 
each score by the same number 


(a) If each score in a frequency distribution is inereased by some 
set amount, say 5, the c is unchanged. The table below provides a 
simple illustration. The mean of the original scores is 7 and с is 1.41. 
When each score is increased by 5, the mean is 12 (7 4- 5), but o is 
still 1.41. Adding a constant (e.g., 5, 10, 15) to each score simply 
moves the whole distribution up the scale 5, 10, or 15 points. The 
mean is increased by the amount of the constant added, but the vari- 
ability (c) is not affected. If a constant is subtracted from each 
score, the distribution is moved down the scale by that amount; the 


mean is decreased by the amount of the constant, and o, again, is 
unchanged. 


Original scores 


Original scores 

(x) к la X45 Же, 

9 2 4 14 2 4 

8 1 1 13 1 1 

7 0 0 12 0 0 

6 -і 1 11 -1 1 

5 -2 4 10 -2 4 

535 10 560 10 

М= 7 М= 12 

10 


0 a = ДО 
c NG 1.41 с ү? 1.41 


(b) What happens to the meansand о when each score is multi- 
plied by a constant is shown in the table below: 


Original seores (X) я 2 z 
9 9% 20 400 
8 80 10 100 
7 70 0 0 
6 60 -10 100 
5 50 -20 400 
85 51350 1000 
M=7 M= 70 
= 141 
d с = 9L vag = 14.14 


Each score in the list of five, shown above, has been multiplied 


by 10; and the net effect of this operation is to multiply the mean 
and the c by 10. 
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4. The o from combined distributions 


When two sets of scores have been thrown together into a single 
distribution, it is possible to calculate the o of the total distribution 
from the o’s of the two component distributions. The formula is 


eA E (о? + d?) № (os* + ds?) (15) 


Scomb 


(SD of a distribution obtained by combining two frequency 
distributions) 
in which 
в = SD of distribution 1 
сг = SD of distribution 2 
4 = (Mi— M con) 
dz = (Ms — Mey) 


Уі and Мә are the numbers of cases in component distributions 1 
and 2, respectively, and N = (№, +- Ns). The Mo is the mean of 
the combined distribution got from formula (3), page 31. 

Ап example will illustrate the use of formula (15). Suppose that in 
а class of 25 children, the mean (Mi) of an achievement test is 
80 and o; = 15; and that in a second class of 75 children, the 
mean (Ms) on the same test is 70 and ог = 25. What is the 
бсошь Of the total distribution of 100 cases? First, we find that M. mb 


= эх Exe = 72.50. We have, then, that dı = (80 — 72.5) 
and dı? = 56.25; d, = (70— 72,5) and 02 = 6.25. Substituting 
in formula (15) for бі, 62, dı, də, Уі, and N3 we find that 
25 (225 + 56.25) + 75(625 + 6.25 
Саты, 21085 ц E: (625 F625) _ 93.59 


Formula (15) may easily be extended to include more than two com- 
ponent distributions by adding N3, оз, ds, and so on. 


111. The Coefficient of Variation, V 


It is often desirable to compare the variability of a given group 
upon two or more different tests; or to compare the variabilities of 
two or more groups upon the same test. We may wish, for example, 
to know whether 8-year-old girls are more variable in height than 
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in weight; or whether 10-year-old boys are more variable than 10- 
year-old girls in vocabulary or in memory span. The Q, MD, and SD 
are not suitable, ordinarily, for such comparisons. These measures 
give the absolute spread or dispersion of test scores around their 
means in terms of the units of the test. But owing to differences in 
measuring units, we cannot compare the variability in height and the 
variability in weight of a given group directly; nor can we compare 
the relative variability in height of two groups, says boys and girls, 
unless the means of the two distributions are at least approximately 
equal. To enable us to tell whether one group is more variable than 
another, we need a measure which takes account both of the central 
tendeney and of the variability of the group, and which is inde- 
pendent of the units in which ability is expressed. One such measure 
is the ratio o/M, called the coefficient of variation, or V. Тһе for- 
mula for V is 


100 X o 
50 16 
yz М (16) 


(the coefficient of variation or coefficient of relative variability) * 


The following illustrations will make the use of the formula clear. 
Consider, first, the case where abilities are measured in different 
units. А group of 7-year-old boys has a mean height of 45 inches 
with a c of 2.5 inches; and a mean weight of 50 pounds with a с of 
6.6 pounds. In which trait is the group more variable, height or 
weight? Since we cannot compare inches and pounds directly, it is 
impossible to answer this question by reference to the SD’s of the 
height and weight distributions. But we сап compare the relative 
variability of the two distributions in terms of their coefficients of 
variation. Thus, 


100 X 2.5 
Vee ies ER 
At 45 - 5.6 һу (16) 
апа Ver = 10X69 260 = 12 by (16) 


from which it appears that these boys are 5.6/12 or 47% as variable 
in height as in weight. 

Now let us consider the case where variability is measured in the 
same units, but around different points on the scale. At the end of five 
minutes, a group of 50 children had Worked an average of 20.50 ex- 


." The multiplier 100 is introduced for the purpose of avoiding small frac- 
tional results. 
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amples correctly, the с being 5.24. At the end of ten minutes, the 
ваше group had worked an average of 34.80 examples correctly, the 
o being 9.62. If we compared the o’s of the two distributions directly, 
we should probably be inclined to conclude that the group was nearly 
twice as variable at the end of the 10-minute period as it was at the 
end of the 5-minute period, since the o has increased from 5.24 to 
9.62. This conclusion is correct as far as the absolute spread or vari- 
ability within the group is concerned. But to compare the relative 
dispersion of the group in the two periods, we must note that, with 
the increase in c, the means have also increased from 20.50 to 34.80. 
The coefficients of variation give the following results: 


100 X 5.24 


For the 5-minute period: V =m 2050 ^ 25.6 


100 X 9.62 
For the 10-minute period: V — bann - 27.6 


34.80 
Thus, instead of being about 50% as variable іп the 5-minute period 
as in the 10, the group is 25.6/27.6 or 93% as variable, when the mean 
score is considered as well as the absolute variability. 

Objection has been raised * to the use of V in comparing the rela- 
tive variability of test scores because the "true" zero point of ability 
in mental and educational tests is unknown. This objection does not 
apply, of course, to physical and physiological measures since these 
have true zeros. How the lack of knowledge of the true zero in a 
mental test may affect V can be shown most readily, perhaps, by an 
example. Suppose that we have given a vocabulary test to a group 
of children, and have obtained a,mean of 25 and a c of 5. V will 
equal 20. Now suppose that we add 30 very easy items, say, to our 
vocabulary test. It is highly probable that every child will know all 
of the added words, and hence the mean score as well as every sub- 
ject’s score will be increased by 30. The absolute variability of the 
group (the c) will, however, remain unchanged, as each subject 
occupies exactly the same relative position as before. An inerease in 
the mean (from 25 to 55) without a corresponding increase in c 
changes V from 20 to 9; and, since we could add 40 or 400 items as 
easily as 30, V appears to be a very unstable measure. 

While theoretically correct, criticism of V because of the arbitrary 


i Franzen, R., “Statistical Issues,” Journal of Educational Psychology, 1924, 
, 867-382. 

Thurstone, L. L., “The Absolute Zero in Intelligence Measurement,” Psycho- 
logical Review, 1928, 35, 175-197. 
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nature of the zero point in mental and educational tests is not so 
generally destructive as it seems. Makers of standard psychological 
tests have been careful to begin their tests with items which, by 
experimental tryout, have been found to have minimal difficulty for 
the group for whom the test is designed. While admittedly arbitrary, 
such “zero” points are at least located at extremely low levels of 
difficulty in the ability measured by the test; hence it would be fool- 
ish to include additional easy items at the low end of the scale. The 
mean tells us how far the group has progressed, on the average, from 
the arbitrary zero point of the test. V shows, essentially, what per- 
centage the variability is of this distance. Like M, V has a definite 
meaning for the test as it stands. If the range of difficulty in the test 
is altered, or the units changed, not only V, but M, is changed. V, 
therefore, is in a sense no more arbitrary than M, and the objections 
to this measure can be directed with equal force against М. 

V is most useful, perhaps, in comparing the variability of a group 
upon the same test administered under different conditions, as, for 
example, when a group of students works at a task with and without 
distraction. The zero point here, at least, remains substantially con- 
stant. V may also be used to compare two or more groups on the 

` same test, as when 10-year-old boys and 10-year-old girls are com- 
pared in tests of logical memory or picture completion. In both of 
these cases it is probably justifiable to assume that the “true” zero 
point of ability is sensibly the same for the groups compared. 

It is, perhaps, most difficult to interpret V when the variability of 
8 group upon different mental tests is a matter of interest. If we 
compare a group of girls for variability in paragraph reading and in 
arithmetic computation, it should "be made plain that the V’s refer 
only to the specific scales upon which performance has been meas- 
ured. Other tests of reading and arithmetic may—and probably will 
—give different results because of difference in test units, range of 
difficulty covered by the test, and position of arbitrary zero points. 
But if one restricts his use of V to the particular measures which he 
has employed, this coefficient will furnish useful information. 


IV. When To Use the Various Measures of Variability 


1, Use the range 
(1) When the data are too scant or too scattered to justify the calcula- 
tion of any other measure of variability. 
(2) When a knowledge of the total spread of scores is all that is wanted. 
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2. Use the Q 


| (1) For a quick, inspectional measure of variability. 
(2) When there are scattered or extreme measures. 
| (3) When the degree of concentration around the median is sought. 


3. Usethe MD 


| (1) When it is desired to weight all deviations according to their size. 
| (2) When extreme deviations should influence the measure of variabil- 
| ity, but not influence it unduly. 


4. Usethe SD 


(1) When the measure having the highest degree of reliability is sought 
| (p. 194). 4 
(2) When it is desired that extreme deviations have a proportionally 
greater influence upon the measure of. variability. 
(3) When coefficients of correlation or measures of reliability are subse- 
quently to be computed (p. 182). 


PROBLEMS 


1. Calculate the Q and o for each of the four frequency distributions given . 
on page 40 under problem 1, Chapter 2. 


2. Calculate the o of the 25 ungrouped scores given on page 24, problem 
5(а), taking the AM at zero. Compare your result with the o’s cal- 
culated from the frequency distributions of the same scores which you 
tabulated in class-intervals of three and five units, 


3. For the following list of test scores, 
52, 50, 56, 68, 65, 62, 57, 70 
(a) Find the M and o by method on page 55. 
(b) Add 6 to each score and recalculate M and o. 
(с) Subtract 50 from each score, and calculate M and б. 
(d) Multiply each score Бу 5 and compute М апа о. 


4. (а) In Sample A (N = 150), M = 120 and с= 20; in Sample B 
(М = 75), М = 126 and с = 22. What are the mean and SD of 
А and B when combined into one distribution of 225 cases? 
(b) What are the mean and SD obtained by combining the following 
three distributions ? 


Distribution N M 6 
І 20 60 8 
П 120 50 20 


ІП 60 40 12 
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5. Caleulate coefficients of variation for the following traits: 


Қ Unit of 
Trait ee ent Group M 2 
Length of mms. 802 males 190.52 5.90 
Head 
Body Weight pounds 868,445 males 141.54 17.82 
Tapping M of 5 trials 68 adults, 196.91 26.83 
Speed 30” each male and female 
Memory No. repeated 263 males 6.60 113 
Span correctly 
General In- Points scored 1101 adults 1533 23.6 
telligence 
(Otis Group 


Intell. Scale) 


Rank these traits in order for relative variability. Judged by their V's 
which trait is the most variable? which the least variable? which traits 
have true zeros? 


6. (a) Why is the Q the best measure of variability when there are scat- 
tered or extreme scores? 
(b) Why does the с weight extreme deviations more than does the MD? 


ANSWERS 
1. (1) Q =3.38 (2 Q=8.13 
с = 499 c = 1133 
(3) Q = 4.50 (4) Q = 1641 
с = 7.23 o= 24.13 


2. o of ungrouped scores = 6.72 
6 of scores grouped in 3-unit intervals = 6.71 
6 of scores grouped in 5-unit intervals = 6,78 


3. (a) M — 60 (b) M — 66 (c) M — 10 (d) M — 300 
o= 6.91 o= 691 o= 691 o= 34.55 
4. (а) М = 122.0; o = 209 
(b) M = 48.00; с = 18.05 
5. Үзіп order are 3.10; 12.59; 13.63; 17.12; 15.39. Ranked for relative 


variability from most to least: Memory Span; General Intelligence; 
Tapping Speed; Weight; Head Length. Last two traits have true zeros, 


—-—————— 
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CUMULATIVE DISTRIBUTIONS, GRAPHIC 
METHODS, AND PERCENTILES 


+ 


In Chapter 1, we learned how to represent the frequency distribu- 
tion by means of the polygon and the histogram. In the present 
chapter, other descriptive methods will be considered—the cumula- 
tive frequency graph, the cumulative percentage curve or ogive, and 
certain simple graphical devices. Also, methods will be given for 
caleulating percentiles and percentile ranks from frequency distribu- 
tions and directly from graphs. 


1. The Cumulative Frequency Graph 
1. Construction of the cumulative frequency graph 


Тһе cumulative frequency graph is another way of representing а 
frequency distribution by means'of a diagram. Before we can plot 
а eumulative frequency graph, the scores of the distribution must be 
added serially or eumulated, as shown in Table 11, for the two dis- 
tributions taken from Table 5, page 30. These two sets of scores 
have already been used to illustrate the frequency polygon and histo- 
gram in Figures 2, 4, and 5, pages 11, 17, and 18. The first two col- 
umns for each of the distributions in Table 11 repeat Table 5, page 
30, exactly; but in the third column (Cum. f) scores have been “ас- 
eumulated" progressively from the bottom of the distribution 
upward. То illustrate, іп the distribution of Army Alpha scores the 
first “cumulative frequency" is 1; 14-3, from the low end of the 
distribution gives 4 as the next entry; 4+2 = 6; 6 +4 210, etc. 
The last cumulative frequency is, of course, equal to 50 or М, the 
total frequency. 

63 


64 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


TABLE 11 Cumulative frequencies for the two distributions given in 
Table 5, page 30 


Army Alpha Cancellation 
Bored) f Cum. f Scores f Cum. f 
195-199 1 50 135.5 to 139.5 3 200 
190-194 2 49 131.5 to 135.5 5 197 
185-189 4 47 127.5 to 131.5 16 192 
180-184 5 43 123.5 to 127.5 23 176 
175-179 8 38 119.5 to 123.5 52 153 
170-174 10 30 115.5 to 119.5 49 101 
165-169 6 20 111.5 to 115.5 27 52 
160-164 4 14 107.5 to 111.5 18 25 
155-159 4 10 103.5 to 107.5 7 7 
150-154 2 6 N = 200 
145-149 3 4 
140-144 ED 1 

N = 50 


The two cumulative frequency graphs which represent the dis- 
tributions of Table 11 are shown in Figures 8 and 9. Consider first 
the graph of the 50 Army Alpha scores in Figure 8. The class-inter- 
vals of the distribution have been laid off along the X-azis. There 


50 


45 


Cumulative Frequencies 


195 1445 149.5 545 1595 1645 1695 1745 173.5 184,5 1895 1945 199.5 
Scores 
FIG. 8 Cumulative frequency graph 
(Data from Table 11, above) 
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Cumulative Frequencies 
ка 
8 


eo 
© 


03.5 107.5 111.5 115.5 119.5 123.5 127.5 131.5 135.5 139.5 
res 


FIG. 9 Cumulative frequency graph 
(Data from Table 11, p. 64) 


are 12 intervals, and by the “75% rule" given on page 12 there 
should be about 9 unit distances (each equal to one class-interval) 
laid off on the Y-axis. Since the largest cumulative frequency is 50, 
each of these Y-units should represent 50/9 or 6 scores (approxi- 
mately). Instead of dividing up the total Y-distance into 9 units 
each representing 6 scores, however, we have, for convenience in plot- 
ting, divided the total Y-distance into 10 units of 5 scores each. This 
does not change significantly the 3:4 relationship of height to width 
in the figure. 

When plotting the frequency polygon the frequency on each inter- 
val is taken at the midpoint of the class-interval. But in constructing 
a cumulative frequency curve each cumulative frequency is plotted 
at the upper limit of the interval upon which it falls. This is because 
we are adding progressively from bottom up and hence each cumu- 
lative frequency carries through to the upper limit of the interval. 
The first point on the curve is one Y-unit (the cumulative frequency 
on 140-144) just above 144.5; the second point is 4 Y-units just 
above 149.5; the third, 6 Y-units just above 154.5, and so on to the 
last point which is 50 Y-units above 199.5. The plotted points are 
joined to give the S-shaped cumulative frequency graph. In order 
to have the curve begin on the Х-атів it is started at 139.5 
(upper limit of 134.5 to 139.5), the cumulative frequency of which 
is 0. 
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The cumulative frequency curve in Figure 9 has been plotted from 
the second distribution in Table 11 by the method just described. 
Тһе curve begins at 103.5, the lower limit of the first class-interval,* 
and ends at 139.5, the upper limit of the last interval; and cumulative 
frequencies, 7, 25, 52, etc., are all plotted at the upper limits of their 
respective class-intervals. Тһе height of this graph was determined 
by the “75% rule" as in the case of the curve in Figure 8. There are 
9 class-intervals laid off on the X-axis; hence, since 75% of 9 is 7 
(approximately), the height of the figure should be about seven 
class-interval units. To determine the score value of each Y-unit 
divide 200 (the largest cumulative frequency) by 7 to give 30 (ap- 
proximately). Each of the 7 Y-units has been taken to represent 
30 scores. 


Il. Percentiles and Percentile Ranks 
1. Calculation of percentiles in a frequency distribution 


We have learned (p. 31) that the median is that point in a fre- 
quency distribution below which lie 50% of the measures or scores; 
and that Q, and Qs mark points in the distribution below which lie, 
respectively, 25% and 75% of the measures or scores. In exactly the 
same way in which the median and quartiles are found, we may com- 
pute points below which lie 10%, 43%, 85%, or any percent of the 
Scores, These points are called percentiles, and are designated, in 
general, by the symbol P,, the р referring to the percentages of cases 
below the given value. Pio, for example, is the point below which lie 
1076 of the scores; Pzs, the point below which lie 78% of the scores. 
Tt is evident that the median, expressed as a percentile, is P59; also 
0118 Pos, and Qs is Prs. 

Тһе method of caleulating percentiles is essentially the same as 
that employed in finding the median. The formula is 


pN—F 


P,=i+( 


(percentiles in a frequency distribution, counting from below up) 


) X 1 (interval) (17) 


where 


p — percentage of the distribution wanted, e.g., 1095, 88%, etc. 
1 = lower limit of the class-interval upon which P, lies 


* Or the upper limit of the interval just below, i.e., 99.5 to 103.5. 
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ФУ = part of N to be counted off in order to reach Р, 
F = sum of all scores upon intervals below l 
fp = number of scores within the interval upon which P, falls 
i = length of the class-interval 


In Table 12, the percentile points, Ру to Po, have been computed 
by formula (17) for the distribution of scores made by the fifty col- 
lege students upon Army Alpha, shown in Table 1, page 5. The 
details of calculation are given in Table 12. We may illustrate 


TABLE 12 Calculation of certain percentiles in a frequency distribution 
(Data are fifty Army Alpha scores, see Table 1, p. 5) 


Scores 7 Cum. f Percentiles 
195-199 1 50 Piw = 199.5 
190-194 2 49 
185-189 4 47 Pw = 187.0 
180-184 5 43 Py = 1815 
175-179 8 38 Py = 1776 
170-174 10 30 Pw = 1745 
165-169 6 20 Pa = 1720 
160-164 4 14 Ре = 1695 
155-159 4 10 Py = 1653 
150-154 2 6 Py = 1595 
145-149 3 4 Py = 1520 
140-144 1 1 
М = 50 Р, = 139.5 
CALCULATION оғ PERCENTILE Рогхтв 
10% of 50 = 5 1495 + (555 ЕЕ = 152.0 
20% of 50 = 10 159,5 + (11 1) х5 = 1595 
30% of 50 = 15 105 (5 М) x s = 165.3 
20 — 20 
40% of 50 - 20 1095 + ( i ) x 5 = 169.5 
25-20 
50% 0450 - 25 1095 + ( 5 ) x 5 = 1720 (Мал) 
60% of 50 = 30 745 (8:539) x5 = ms 
70% of 50 = 35 45+ (2S %) x s = 1776 
80% of 50 = 40 1795 (8539) x5 = 1815 
4 
90% of 50 = 45 1845 (82:59) x5 = 1870 
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the method with Pzo. Here pN = 35 (70% of 50 = 35), and from the 
Cum. f we find that 30 scores take us through 170-174 up to 174.5, the 
lower limit of the interval next above. Hence, Рр falls upon 175- 
179, and, substituting pN = 35, F = 30, f, = 8 (frequency upon 175- 
179), and i— 5 (class-interval) in formula (17), we find that 
Py = 177.6 (for detailed ealeulation, see Table 12). This result 
means that 70% of the 50 students scored below 177.6 in the distribu- 
tion of Army Alpha scores. The other percentile values are found in 
exactly the same way as Pz. The reader should verify the calcula- 
tions of the P, in Table 12 in order to become thoroughly familiar 
with the method. 

It should be noted that Po, which marks the lower limit of the 
first interval (namely, 139.5) lies at the beginning of the distribution. 
Pioo marks the upper limit of the last interval, and lies at the end of 
the distribution. These two percentiles represent limiting points. 
Their principal value is to indicate the boundaries of the percentile 
scale. 


2. Calculation of percentile ranks in a frequency distribution 


We have seen in the last section how percentiles, e.g., Рі or Ps, 
may be calculated directly from a frequency distribution. To repeat 
what has been said above, percentiles are points in a continuous dis- 
tribution below which lie given percentages of N. We shall now 
consider the problem of finding an individual’s percentile rank (PR); 
or the position on a scale of 100 to which the subject’s score entitles 
him. The distinction between percentile and percentile rank will be 
clear if the reader remembers that in calculating percentiles he starts 
with a certain percent of N, say 15% or 62%. He then counts into 
the distribution the given percent and the point reached is the re- 
quired percentile, e.g., Рі or Pgs. The procedure followed in comput- 
ing percentile ranks is the reverse of this process. Here we begin with 
an individual score, and determine the percentage of scores which lies 
below it. If this percentage is 62, say, the score has a percentile rank 
. or PR of 62 ona scale of 100. 

We may illustrate with Table 12. What is the PR of a man who 
scores 163? Score 163 falls on interval 160-164. There are ten scores 
up to 159.5, lower limit of this interval (see column Cum. f), and 
four scores spread over this interval. Dividing 4 by 5 (interval 
length) gives us .8 score per unit of interval. The score of 163, which 
We are seeking, is 3.5 score units from 159.5, lower limit of the inter- 
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val within which the score of 163 lies. Multiplying 3.5 by .8 we get 
2.8 as the score-distance of 163 from 159.5; and adding 2.8 to 10 
(number of scores below 159.5) we get 12.8 as the part of N lying 
below 163. Dividing 12.8 by 50 gives us 25.696 as that proportion of 
N below 163; hence the percentile rank of score 163 is 26. Тһе dia- 
gram below will clarify the calculation: 


ЕА 
8 


8 8 8 4 |_4 8 | 
159.5 160.5 161.5 162.5 163.5 164.5 
163.0 


Теп scores lie below 159.5. Prorating the 4 scores on 160-164 over the 
interval of 5, we have .8 score per unit of interval. Score 163 is just 
8+ .8 + .8 + 4 or 2.8 scores from 159.5; or score 163 lies 12.8 scores 
or 25,6% (12.8/50) into the distribution. 

Тһе PR of any score may be found in the same way. For example, 
the percentile rank of 181 is 79 (verify it). The reader should note 
that a score of 163 is taken as 163.0, midpoint of the score-interval 
162.5 to 163.5. This means simply that the midpoint is assumed to 
be the most representative value in a score-interval. The percentile 
ranks for several scores may be read directly from Table 12. For 
instance, 152 has a РЁ of 10, 172 (median) a PR of 50, and 187 a PR 
of 90. If we take the percentile-points as representing approximately 
the score-intervals upon which they lie, the P of 160 (upon which 
159.5 lies) is approximately 20 (see Table 12); the PR of 165 (upon 
which 165.3 lies) is approximately 30; the PR of 170 is approxi- 
mately 40; of 175, 60; of 178, 70; of 182, 80. These PR’s are not 
strictly accurate, to be sure, but the error is slight. 


lll. The Cumulative Percentage Curve or Ogive 


І. Construction of the ogive 


The cumulative percentage curve or ogive differs from the cumula- 
tive frequency graph in that frequencies are expressed as cumulative 
percents ої № on the Y-axis instead of as cumulative frequencies. 
Table 13 shows how cumulative frequencies can be turned into per- 
centages of N. The distribution consists of scores made on a reading 
test by 125 seventh-grade pupils. In columns (1) and (2) class- 
intervals and frequencies are listed; and in column (3) the /в have 
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TABLE 13 Calculation of cumulative percentages to upper limits of 
class-intervals in a frequency distribution 


(The data represent scores on a reading test achieved 
by 125 seventh-grade children) 


(1) (2) (8) (4) 
Scores y Cum. f Cum. Percent f 
74.5 to 79.5 1 125 100.0 
69.5 to 74.5 3 124 99.2 
64.5 to 69.5 6 121 96.8 
59.5 to 64.5 12 115 92.0 
54.5 to 59.5 20 103 824 
49.5 to 54.5 36 83 66.4 
44.5 to 49.5 20 47 37.6 
39.5 to 44.5 15 27 21.6 
34.5 to 39.5 6 12 9.6 
29.5 to 34.5 4 6 48 
24.5 to 29.5 1^2 2 16 
М = 125 
Rate wi = = 06 
Nt 1262714 
100 
90 
80 
g 70 
5 
= 60 
e 
550 
> 
2 40 
2 
2 
Е 30 
о 
20 
10 


24,5 295 34.5 39.5 44,5 49.5 54.5 59.5 64.5 69.5 145 79.5 
$согез 


FIG. 10 Cumulative percentage curve or ogive plotted from the data 
of Table 13, above 
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been eumulated from the low end of the distribution upward as 
described before on page 63. These Cum. f's are expressed as per- 
centages of N (125) in column (4). The conversion of Cum. f's into 
cumulative percents can be carried out by dividing each cumulative 
f by №; e.g., 22-125 = .016, 65-125 = .048, and so on. A better 
method—especially when a calculating machine is available—is to 
determine first the reciprocal, 1/N, called the Rate, and multiply 
each cumulative f in order by this fraction. As shown in Table 13, 
the Rate is 1/125 or .008. Hence, multiplying 2 by .008, we get .016 
or 1.6%; 6 X .008 = .048 or 4.8% ; 12 X .008 = .096 or 9.6%, etc. 

The curve in Figure 10 represents an ogive plotted from the data 
in column (4), Table 13. Class-intervals have been laid off on the 
X-axis, and a scale consisting of 10 equal distances, each represent- 
ing 10% of the distribution, has been marked off on the Y-axis. The 
first point on the ogive is placed 1.6 Y-units just above 29.5; the 
second point is 4.8 Y-units just above 34.5, etc. The last point is 
100 Y-units above 79.5, upper limit of the highest class-interval. 


2. Computing percentiles and percentile ranks from (a) the cumulative 
percentage distribution and from (Б) the ogive 


(a) Percentiles may be readily determined by direct interpolation 
in column (4), Table 13. We may illustrate by calculating the 71st 
percentile. Direct interpolation between the percentages in column 
(4) gives the following: 


66 4% of the distribution up to 54.5 
71.0% ------»-------------------------------- —55.9 


шо) ary 82.4% of the distribution up to 595 5 
16.0% 

The 71st OG lies 4,6% above 66.4%. X sul proportion, 
aos 3 g= е xax 5 — 1.4 (xis the distance of the 71st percentile 
from 54.5). The 71st percentile, therefore, is 54.5 + 1.4, or 55.9. 

Certain percentiles can be read directly from column (4). We 
know, for instance, that the 5th percentile is approximately 34.5; 
that the 22nd percentile is approximately 44.5; that the 38th percen- 
tile is approximately 49.5; and that the 92nd percentile is exactly 
64.5. Another way of expressing the same facts is to say that 21.6% 
of the seventh graders scored below 44.5, that 92% scored below 
64.5, ete. 

Percentile ranks may also be determined from Table 13 by inter- 
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polation. Suppose, for example, we wish to calculate the PR of 
score 43. From column (4) we find that 9.6% of the scores are below 
89.5. Score 43 is 3.5 (43.0 — 39.5) from this point. There are 5 score- 
units on the interval 39.5 to 44.5 which correspond to 12.0% 
(21.6 — 9.6) of the distribution; hence, 3.5/5 X 12.0 or 8.4 is the 
percentage distance of score 43 from 39.5. Since 9.6% (up to 
39.5) +8.4% (from 39.5 to 43.0) comprise 18% of the distribution, 
this percentage of N lies below score 43. Hence, the PR of 43 is 18. 
See detailed calculation below. 


9.6% of distribution up to 39.5 


18.0% _----------------------------- ------ score 43.0 
21.6% of distribution up to 44.5 (given) 
12.0% 5.0 


Score 43.0 is 3.5/5 X 12.0% or 8.4% from 39.5; hence score 43.0 is 
9.6% -Е 8.4% or 18.0% into the distribution. 

It should be noted that the cumulative percents in column (4) give 
the PR's of the upper limits of the class-intervals in which the scores 
have been tabulated. The PR of 74.5, for example, is 99.2; of 64.5, 
92.0; of 44.5, 21.6, etc. These PR’s are the ranks of given points in 
the distribution, and are not the PR’s of scores. 

(b) Percentiles and percentile ranks may also be determined 
quickly and fairly accurately from the ogive of the frequency dis- 
tribution plotted in Figure 10. To obtain Ру, the median, for exam- 
ple, draw a line from 50 on the Y-scale parallel to the X-axis and 
where this line cuts the curve drop a perpendicular to the X-azis. 
This operation will locate the median at 51.5, approximately. The 
exact median, calculated from Table 13, page 70, is 51.65. Qi and 
Qs are found in the same way аз the median. P»; ог Q; falls approxi- 
mately at 45.0 on the X. -ахіз, and Ps; or Qs falls at 57.0. These values 
may be compared with the caleulated 01 and Qs, which are 45.56 and 
57.19, respectively. Other percentiles are read in the same way. To 
find Peo, for instance, begin with 62 on the Y-axis, go horizontally 
over to the curve, and drop a perpendicular to locate Pss approxi- 
mately at 54. 

In order to read the percentile rank of a given score from the ogive, 
we reverse the process followed in determining percentiles. Score 71, 
for example, has a PR of 97, approximately (see Figure 10). Calcu- 
lation consists in starting with score 71 on the X-axis, going verti- 
cally up to the ogive, and horizontally across to the Y-axis to locate 
the PR at 97 on the cumulative percentage scale. The PR of score 47 
is found in the same way to be approximately 30. 
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It will be noted that percentiles and percentile ranks are usually 
slightly in error when read from an ogive. If the curve is carefully 
drawn, however, the diagram fairly large, and the scale divisions pre- 
cisely marked, percentiles апа PR’s may be read to a degree of 
accuracy sufficient for most purposes. 


3. Other uses of the ogive 


(1) COMPARISON OF GROUPS 

A useful over-all comparison of two or more groups is provided 
when ogives representing their scores on a given test are plotted upon 
the same codrdinate axes. An illustration is given in Figure 11, 
page 74, which shows the ogives of the scores earned by two groups 
of ehildren—200 ten-year-old boys and 200 ten-year-old girls—upon 
ап arithmetie reasoning test of 60 items. Data from which these 
ogives were constructed are given in Table 14. 


TABLE 14 Frequency distributions of the scores made by 200 ten-year- 
old boys and 200 ten-year-old girls on an arithmetic reason- 


ing test 
в шо б d Soo d 
Boys um. um. irls um. um. 
Scores Cum.f %) Percent- f Cum. f %) Percent- 


age f age f 


Several interesting observations can be made from Figure 11. The 
boys’ ogive lies to the right of the girls’ over the entire range, show- 
ing that the boys score consistently higher than the girls. Differences 
in achievement as between the two groups are shown by the distances 
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separating the two curves at various levels. It is clear that differ- 
ences at the extremes—between the very high-scoring and the very 
low-scoring boys and girls—are not so great as are differences over 
the middle range. A more detailed analysis of the achievement of 
these two groups comes out in a comparison of certain points in the 
distribution. The boys’ median is approximately 42, the girls’ 32; 
and the difference between these measures is represented in Figure 11 
by the line AB. The difference between the boys’ Q, and the girls’ Q, 
is represented by the line CD; and the difference between the two 
з’ is shown by the line EF. It is clear that the groups differ more 
at the median than at either quartile, and are farther separated at 
Qs than at 01. 


Cumulative Percents 


Scores 


FIG. 11 Ogives representing scores made by 200 boys and 200 girls 
on an arithmetic reasoning test 


(See Table 14, page 73) 


The extent to which one distribution overlaps another, whether at 
the median or at other designated points, can be determined quite 
readily from their ogives. By extending the vertical line through B 
(the boys' median) up to the ogive of the girls' scores, it is clear that 
approximately 8895 of the girls fall below the boys' median. Hence, 
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approximately 12% of girls exceed the median of the boys in arith- 
metie reasoning. Computing overlap from boys to girls, we find that 
approximately 76% of the boys exceed the girls’ median. The verti- 
cal line through A (girls! median) cuts the boys’ ogive at approxi- 
mately the 24th percentile. Therefore 24% of the boys fall below the 
girls’ median, and 76% are above this point. Still another illustration 
may be helpful. Suppose the problem is to determine what percent- 
age of the girls score at or above the boys' 60th percentile. The 
answer is found by locating first the point where the horizontal line 
through 60 cuts the boys' ogive. We then find the point on the girls" 
ogive directly above this value, and from here proceed horizontally 
across to locate the percentile rank of this point at 93. Since 93% 
of the girls fall below the boys' 60th percentile, about 796 score above 
this point. 


(2) PERCENTILE NORMS 

Norms are measures of achievement which represent the typical 
performance of a designated group or groups. The norm for 10- 
year-old boys in height, and the norm for seventh-grade pupils in 
City X in arithmetic is usually the mean or the median for the group. 
But norms may be much more detailed and may be reported for 
other points in the distribution as, for example, Q:, Qs, and various 
percentiles. 

Percentile norms are especially useful in dealing with educational 
achievement examinations, when one wishes to evaluate and compare 
the achievement of a given student in a number of subject-matter 
tests. If the student earns a score of 63 оп an achievement test in 
arithmetie, and a score of 143 on an achievement test in English, we 
have no way of knowing from the scores alone whether his achieve- 
ment is good, medium, or poor, or how his standing in arithmetic and 
in English compare. If, however, we know that a score of 63 in 
arithmetic has a PR of 52, and a score of 143 in English a PR of 
68, we may say at once that this student is average in arithmetic 
(52% of the students score lower than he) and good in English (68% 
score below him). 

Percentile norms may be determined directly from the smoothed 
ogives of score distributions. Figure 12 represents the smoothed 
ogives of the two distributions of scores in arithmetic reasoning given 
in Table 14. Vertical lines drawn to the base line from points on the 
ogive locate the various percentile points. In Table 15 below, selected 
percentile norms in the arithmetic reasoning test have been tabulated 
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TABLE 15 Percentile norms for arithmetic reasoning fest (Table 14) 
obtained from smoothed ogives in Figure 12 


Girls Boys 


Cum. %’s Ogive Calculated Ogive ^ Calculated 
99 52.0 49.0 57.5 54.5 
95 46.5 44.5 54.5 52.9 
90 43.5 42.7 52.5 50.9 
80 40.0 39.2 49.0 48.1 
70 37.0 36.9 46.5 46.1 
60 35.0 34.6 44.0 44.0 
50 32.5 32.5 41.5 41.8 
40 30.0 30.0 39.0 39.7 
30 27.0 27.5 35.0 34.8 
20 23.5 25.0 30.0 30.9 
10 18.5 18.0 24.5 25.2 

5 140 15.5 19.5 20.1 
1 8.5 3.3 6.5 14.5 


for boys and girls separately. This table of norms may, of course, be 
extended by the addition of other intermediate or extreme values. 
Caleulated percentiles are included in the table for comparison with 
percentiles read from the smoothed ogives. These calculated values 
are useful as a check on the graphically determined points, but ordi- 
narily need not be found. 


100 


Cumulative Percents 


95 145 195 24.5 29.5 345 395 445 495 545 535 
Scores 
FIG. 12 Smoothed ogives of the scores in Table 14 
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It is evident that percentile norms read from an ogive are not 
strictly accurate, but the error is slight except at the top and bottom 
of the distribution. Estimates of these extreme percentiles from 
smoothed ogives are probably more nearly true values than are the 
calculated points, since the smoothed curve represents what we might 
expect to get from larger groups or in additional samplings. 

The ogives in Figure 12 were smoothed in order to iron out minor 
kinks and irregularities in the curves, Owing to the smoothing proc- 
ess, these curves are more regular and continuous than are the orig- 
inal ogives in Figure 11, page 74. The only difference between the 
process of smoothing an ogive and smoothing a frequency polygon 
(p. 14) is that we average cumulative percentage frequencies in the 
ogive instead of actual frequencies. Smoothed percentage frequencies 
are given in Table 14. The smoothed cumulative percent frequency 


to be plotted above 24.5, boys’ distribution, is maboti or 


10.0; for the same point, girls’ distribution, it is med 1802180 


or 23.0. Care must be taken at the extremes of the distribution 
where the procedure is slightly different. In the boys' distribution, 
for example, the smoothed cumulative percent frequency at 9.5 is 
05200-00 E 9 +00 or .3%, and at 59.5, it is 99.0 + 100.0 + 100.0 Oct x 271099 ог 99,7. 
At 4.5 and 64.5, both of which lie outside the boys' distribution, 

100 + 100 4- zd 

3 

› respectively. Note that the smoothed ogive ex- 


the cumulative percentage frequencies are 
and | 2+0+0 +040 
tends one interval beyond the original at both extremes of the 
distribution. 

There is little justification for smoothing an ogive which is already 
quite regular or an ogive which is very jagged and irregular. In the 
first instance, smoothing accomplishes little if anything; in the sec- 
ond, it may seriously mislead. A smoothed curve shows what we 
might expect to get if the test or sampling, or both, were different 
(and perhaps better) than they actually were. Smoothing should 
never be a substitute for getting additional data or for constructing 
an improved test. It should certainly be avoided when the group is 
small and the ogive very irregular. Smoothing is perhaps most useful 
when the ogives show small irregularities here and there (see Fig- 
ure 11) which may reasonably be assumed to have arisen from small 
and not very important factors. 
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IV. Other Graphical Methods 


Data obtained from many problems in mental measurement, espe- 
cially those which involve the study of changes attributable to 
growth, practice, learning, and fatigue, may be treated profitably 
by graphical methods. Two widely used devices are the line graph, 
frequently found in experimental psychology, and the bar diagram 
more often met with, perhaps, in education. These two methods will 
be described in this section. 


1. The line graph 


Figure 13 shows an age-progress curve. This graph represents the 
change in “logical memory for a connected passage” in boys and 
girls from 8 to 18 years old. Norms for adults are also included 
on the diagram. Age is represented on the horizontal or X-axis 


8 9 10 1218 
Age 

FIG. 13 Logical memory. Age is represented on X-line (horizontal); 
Score, i.e., number of ideas remembered, on Y-line (vertical) 


(After Pyle) 7 


п 14 15 16 17  18Adults 


and "average number of ideas reproduced" at each age level is 
marked off on the vertical or Y-axis. Memory ability as measured 
by this test rises to a peak at year 15 for both groups after which 
there is a slight decline followed by a rise at the adult level. There 
is a small but consistent sex difference throughout, the girls being 
higher on the Average at each age. 

Figure 14 illustrates the learning or practice curve. These curves 
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Letters per Minute 
A 
о 
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X 


0 4 8 12 16 20 24 28 32 86 40 44 48 
Weeks of Practice 


FIG. 14 Improvement in telegraphy. Weeks of practice оп X-line; 
number of letters per minute on Y-/ine 
(After Bryan and Harter) 


show the improvement, in sending and receiving telegraphic mes- 
sages, resulting from successive trials at the same task over a period 
of forty-eight weeks. Improvement as measured by the number of let- 
ters sent or received per minute is indicated along the Y-axis, Weeks 
of practice at the given task are represented by equal intervals on 
the X-axis. 

Figure 15 is a performance or practice "curve." It represents 
twenty-five successive trials with the hand dynamometer made by 


1.3 5 7 9 1 13 15 17 19 21 23 25 
Trials 
FIG. 15 Hand dynamometer readings in kilograms for 25 successive 
grips at intervals of ten seconds. Two subjects, a man and a woman 
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one man and one woman. À marked sex difference in strength of 
grip is apparent throughout the practice period. Also as the experi- 
ment progressed a tendency to fatigue is evident in both subjects. 
Figure 16 is Ebbinghaus’ well-known “curve of retention." This 
curve represents memory retention as measured by the percentage of 
the original material retained after the passage of different time 


Percent Retained 


- 
25895222395 


lhr.9hr. 24hr. 48 hr. 144 hr, 
Time between Learning and Relearning 


FIG. 16 Curve of retention. The numbers on the baseline give hours 
elapsed from time of learning; numbers along Y-axis give 
percent retained 


intervals. The time intervals between learning and relearning are 
laid off on the Х-алїз; and the percent retained, as measured by 
relearning, on the Y-axis. 


2. The bar diagram > 


The bar graph is sometimes used in psychology to compare the 
relative amounts of some attribute (height, intelligence, educational 
achievement, ete.) possessed by two or more groups. In education 
the bar graph may be used to compare (usually in percentage terms) 
several different variables. Examples are: the cost of instruction in 
various schools or in different counties; distribution of student time 
in and out of school; teachers’ salaries by states or districts; relative 
expenditures for various purposes. A common form of the bar 
graph is that in which a set of bars is used, the lengths of the bars 
being proportional to the amounts of the variable possessed. For 
emphasis, a space is usually left between the bars, which are drawn 
side by side and may be either vertical or horizontal. 
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А horizontal bar graph is shown in Figure 17. These bars represent 
the percentage of officers in various branches of the military service 
during World War I who received grades of A and B or C upon the 
Army Alpha Examination. The bars are arranged in order, the group 


y Vererinary | 
40 80 20 10 0 10 20 30 40 50 60 70 80 90 


c 


FIG. 17 Comparative bar Км The bars represent the percentage 
in each division of the military service receiving A's and B's or C's 


School A 


FIG. 18 Divided bar graphs. The two bars represent student enrollment 
in two high schools. Each bar is divided into four divisions. The length 
of a division shows the proportion or percentage of students in that class 
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receiving the highest percent of A's and B's being placed at the top 
It is clear from the diagram that the Engineers, who ranked first, 
received about 95% A's and B's and about 576 C's. The Veterinary 
Corps, whieh ranked last, received about 60% А% and B's and 
4096 C's. 

Another illustration of a bar graph is shown in Figure 18. Тһе two 
parallel rectangles or “bars” represent student enrollment in two city 
high schools. Each bar is divided into four parts to represent fresh- 
men, sophomores, juniors, and seniors. The size of a division is pro- 
portional to the percentage which each class is of the whole group. 
This type of graph is often called а divided-bar graph. 


PROBLEMS 


1. The following distributions represent the achievement of two groups, 
А and B, upon a memory test. 


(a) Plot cumulative frequeney graphs of Group A's and of Group B's 
scores, observing the 75% rule. 

(b) Plot ogives of the two distributions A and В upon the same axes. 

(c) Determine Pso, Po, and Poo graphically from each of the ogives and 
compare graphically determined with calculated values. 

(d) What is the percentile rank of score 55 in Group A's distribution? 
In Group B's distribution? 

(e) A percentile rank of 70 in Group A corresponds to what percentile 
rank in Group B? 

(f) What percent of Group А exceeds the median of Group B? 


Scores Group A Group B 
79-83 6 8 
74-78 7 8 
69-73 8 9 
64-68 10 16 
50-63 12 20 
54-58 15 18 
49-53 23 19 
44-48 16 11 
89-43 10 13 
84-38 12 8 
29-33 6 7 
24-28 3 2 
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2. Construct an ogive of the following distribution of scores. 


Scores Í 
159.5 to 169.5 1 
149.5 to 159.5 5 
139.5 to 149.5 13 
129.5 to 139.5 45 
119.5 to 129.5 40 
109.5 to 119.5 80 

99.5 to 109.5 51 
89.5 to 99.5 48 
79.5 to 89.5 36 
69.5 to 79.5 10 
59.5 to 69.5 5 
49.5 to 59.5 1 

N = 285 


Read off percentile norms for the cumulative percentages: 
99, 95, 90, 80, 70, 60, 50, 40, 30, 20, 10, 5, and 1. 


3. Given the following data from five cities in the United States, represent 
the facts graphically by means of a bar graph. 


Percent of population which is 


City Native White Foreign-born White Negro 

A 65 30 05 

В 60 10 30 

с 50 45 .05 

D 40 20 40 

Е 30 10 60 

ANSWERS 
Group A Group B 
Ogive Cal. Ogive Cal. 
1. (c) Peo 46.0 45.81 485 48.69 
Poo 56.0 55.77 59.75 59.85 
P 740 73.64 75.5 7481 
(d) 58; 47 


(e) 62 (f) 39-4095 of Group А exceed the median of Group B. 


2. Read from ogive: 

Cum. Percents: 99 95 90 80 70 60 50 40 30 
Percentiles: 159 142.5 1875 1315 1245 116.5 107 102 96.5 
20 10 5 1 
91 82.5 79 64.5 
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ADDITIONAL PROBLEMS AND QUESTIONS ON CHAPTERS 1—4 


. Describe the characteristics of those distributions for which the mean 


is not An adequate measure of central tendency. 


- When is it inadvisable to use the coefficient of variation? 
- What is a multimodal distribution? 
‚ A student writes in a theme that by the application of eugenics it 


would be possible to raise the intelligence of the race, so that more 
people would be above the median 1.0. of 100. Comment on this. 


- Why cannot the с of one test usually be compared directly with the g 


of another test? 


6. What effect will an inerease іп М probably have upon Q? 


10. 


10. 


- What is the difference between a percentile and the ordinary percent 


grade used in school? 

Does a percentile rank of 65 earned by a given pupil mean that 65% 
of the group make scores above him; that 65% make the same Score; 
or that 65% make scores below him? 


Caleulate the mean, median, mode, Q, and SD for each of the follow- 
ing distributions: 


(1) Scores f (2) Scores f (3) Scores f 
90-99 2 14-15 3 25 1 
80-89 12 12-13 8 24 2 
70-79 22 10-11 15 23 6 
60-69 20 8-9 20 22 8 
50-59 14 6-7 10 21 5 
40-49 4 4-5 E 20 2 
30-39 1 N=60 19 1 

25 


togram upon the same coórdinate axes. 
(6) Plot the distribution in 9 (2) as an ogive. Locate graphically the 
the median, Q,, and Qs. Determine the PR of score 9; of score 12. 


ANSWERS 
(1) Mean = 68.10 (2 Mean = 9.23 (3) Mean = 22.04 
Median = 68.75 Median = 9.10 Median = 22.06 
Mode = 70.05 Mode = 8,84 Mode = 22.10 
Q= 9.01 Q — 1.69 Q= 91 
SD = 1250 SD = 248 SD — 134 


(b) Median — 90; Q, = 75; Q; = 110 (Read from ogive) 
PR of 9 — 50; of 12 — 845 


R 
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THE NORMAL PROBABILITY CURVE 


* 


1. The Meaning and Importance of the Normal 
Probability Distribution 


I. Introduction 


In Figure 19 are four diagrams, two polygons and two histograms, 
which represent frequency distributions of data drawn from anthro- 
pometry, psychology, and meteorology. It is apparent, even upon 
superficial examination, that all of these graphs have the same gen- 
eral form—the measures are concentrated closely around the center 
and taper off from this central high point or crest to the left and 
right. There are relatively few measures at the “low-score” end of 
the scale; an inereasing number up to a maximum at the middle posi- 
tion; and a progressive falling-off toward the “high-score” end of the 


IQ. 60 80 100 120 140 


1. Form L 1.0. distribution and best-fitting normal curve, ages 23 to 18. 
(from McNemar, Quinn, The Revision of the Stanford-Binet Scale, p. 19) 
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25 
20 


2 4 6 10 в M в 


8 
Digit Span 
2. Memory span for digits, 123 adult women students, (After Thorndike.) 


ae 


58 60 62 64 66 68 70 72 74 76 78 
Stature in Inches 


3. Statures of 8585 adult males born in the British Isles. (After Yule.) 


Frequency per 1⁄0 Inch Interval 
Z3 E 


285 290 295 300 305 310 
Height in Inches 


4. Frequency distribution of barometer heights at Southampton: 4748 
observations. (After Yule.) 
FIG. 


19 Frequency distributions drawn from different fields 
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FIG. 20 Normal probability curve 


scale. If we divide the area under each curve (the area between the 
curve and the X-axis) by a line drawn perpendicularly through the 
central high point to the baseline, the two parts thus formed will be 
similar in shape and very nearly equal in area. It is clear, therefore, 
that each figure exhibits almost perfect bilateral symmetry. The 
perfectly symmetrical curve, or frequency surface, to which all of 
the graphs in Figure 19 approximate, is shown in Figure 20. This 
bell-shaped figure is called the normal probability curve, or simply 
the normal curve, and is of great value in mental measurement. An 
understanding of the characteristics of the frequency distribution 
represented by the normal curve is essential to the student of experi- 
mental psychology and mental measurement. This chapter, there- 
fore, will be concerned with the normal distribution, and its frequency 
polygon, the normal probability curve. 


2. Elementary principles of probability 


Perhaps the simplest approach to an understanding of the normal 
probability curve is through a consideration of the elementary prin- 
ciples of probability. As used in statistics, the "probability" of a 
given event is defined as the expected frequency of occurrence of this 
event among events of a like sort. This expected frequency of occur- 
rence may be based upon a knowledge of the conditions determining 
the occurrence of the phenomenon, as in dice-throwing or coin-toss- 
ing, or upon empirical data, as in mental and social measurements. 

The probability of an event may be stated most simply, perhaps, 
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as a ratio. We know, for example, that the probability of an un- 
biased coin falling heads is 1/2, and that the probability of a die 
showing a two-spot is 1/6. These ratios, called probability ratios, 
are defined by that fraction the numerator of which equals the 
desired outcome or outcomes and the denominator of which equals 
the total possible outcomes. A probability ratio always falls between 
the limits .00 (impossibility of occurrence) and 1.00 (certainty of 
occurrence). Thus the probability that the sky will fall is .00; that 
an individual now living will some day die is 1.00. Between these 
limits are all possible degrees of likelihood which may be expressed 
by appropriate ratios. 

Let us now apply these simple prineiples of probability to the 
specific case of what happens when we toss coins.* If we toss one 
coin, obviously it must fall either heads (Н) or tails (T) 100% of the 
time; and furthermore, since there are only two possible outcomes 
in a given throw, a head or a tail is equally probable. Expressed 
as a ratio, therefore, the probability of H is 1/2; of T 1/2; and 


(Н +T) = 1/2+ 1/2 = 1.00 


If we toss two coins, (a) and (b), at the same time, there are four 
possible arrangements which the coins may take: 


а) (2) (3) (4) 
a b a b a b a b 
HH HP ТӘН quem 


Both coins (a) and (b) may fall H; (a) may fall H and (b) m: 
(b) may fall H and (a) T; or both coins may fall T. Expressed as 
ratios, the probability of two heads is 1/4 and the probability of two 
tails 1/4. Also, the probability of an HT combination is 1/4, and of 
а TH combination 1/4. And since it ordinarily makes no difference 
which coin falls H or which falls T, we may add these two ratios (or 
double the one) to obtain 1/2 as the probability of an HT combina- 
tion, The sum of our probability ratios is 1/4 + 1/2 4- 1/4 or 1.00. 

Let us go a step farther and inerease the number of coins to three. 
If we toss three coins (a), (b), and (c) simultaneously, there are 
eight possible outcomes: 

а) Q (3) (4) ®© (6) 
HHH HHT HTH THH HTT THT ттн ттт 
Expressed as ratios, the probability of three heads is 1/8 (combina- 
tion 1); of two heads and one tail 3/8 (combinations 2, 3, and 4); 


т Coin-tossing and dice-throwing furnish easily understood and often used 
illustrations of the so-called “laws of chance.” 


ез 
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of one head and two tails 3/8 (combinations 5, 6, and 7) ; and of three 
tails 1/8 (combination 8). The sum of these probability ratios is 
1/8 + 3/8 + 3/8 + 1/8, or 1.00. 

By exactly the same method used above for two and for three 
coins, we can determine the probability of different combinations of 
heads and tails when we have four, five, or any number of coins. 
These various outcomes may be obtained in a somewhat more direct 
way, however, than by writing down all of the different combinations 
which may occur. If there are independent factors, the probability 
of the presence or absence of each being the same, the “compound” 
probabilities of the appearance of various combinations of factors 
will be expressed by expansion of the binomial (p +q)". In this 
expression p equals the probability that a given event will happen, 
q the probability that the event will not happen, and the exponent n 
indicates the number of factors (e.g., coins) operating to produce the 
final result.* If we substitute Н for p and T for 4 (tails = non- 
heads), we have for two coins (Н + T)?; and squaring, the binomial 
(Н Т)? = H? J- 2HT 4- T2. This expansion may be written, 


1 H? 1 chance in 4 of 2 heads; probability ratio -- 1/4 

2 HT 2 chances іп 4 of 1 head and 1 tail; probability ratio = 1/2 

1 T? 1 chance in 4 of two tails; probability ratio = 1/4 
Total = 4 


These outcomes are identical with those obtained above by listing 
the three different combinations possible when two coins are tossed. 
If we have three independent factors operating, the expression 
(p-l-q)" becomes for three coins (H +T)’. Expanding this bi- 
nomial, we get Нз + 3H?T + 3HT*+ T°, which may be written, 


1 НЗ 1 chance in 8 of 3 heads; probability ratio =1/8 
3 H?T 3 chances in 8 of 2 heads and 1 tail; probability 
ratio 1 = 8/8 
3 HT? 3 chances in 8 of 1 head and 2 tails; probability 
ratio = 3/8 
1 T? 1 chance in 8 of 3 tails; probability ratio = 1/8 
Total = 8 


Again these results are identical with those got by listing the four 
different combinations possible when three coins are tossed. 


* We may, for example, consider our coins to be independent factors, the 
oceurrence of a head to be the presence of a factor and the occurrence of a tail 
the absence of a factor. Factors will then be “present” or “absent” in the vari- 
ous heads-tails combinations. 
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Тһе binomial expansion may be applied still more generally to 
those cases in which there are a larger number of independent factors 
operating. If we toss ten coins simultaneously, for instance, we have 
by analogy with the above, (p -- q)'9. This expression may be writ- 
ten (Н--Т)19, Н standing for the probability of a head, T for the 
probability of a non-head (tail), and 10 for the number of coins 
tossed. When the binomial (H + T)? is expanded, the terms are 


Н! + 10Н9Т +- 45H°T? + 1208273 + 210H€T* + 252H5T5 + 210H«T9 
+ 120Н977 +. 451278 -+ 10НТ9 +. T10 


which may be summarized as follows: 
Probability 
Ratio 
1H» 1 chance in 1024 of all coins falling heads 
10 H*T! 10 chances іп 1024 of 9 heads and 1 tail... 
45 H*T* 45 chances in 1024 of 8 heads and 2 tails. . 
120 H?T* 120 chances in 1024 of 7 heads and 3 tails. . 
210 H*T* 210 chances in 1024 of 6 heads and 4 tails. . 
252 НУТ» 252 chances in 1024 of 5 heads and 5 tails. . 
210 H'T* 210 chances in 1024 of 4 heads and 6 tails. . 
120 HT" 120 chances in 1024 of 3 heads and 7 tails. . 
45 Н?Т* 45 chances in 1024 of 2 heads and 8 tails. . 
10 НТ» 10 chances in 1024 of 1 head and 9 tails. . . 
eus 1 chance in 1024 of all coins falling tails. . 

Тоба! - 1024 
These data аге represented graphically in Figure 21 by a histogram 
and frequency polygon plotted on the same axes. The eleven terms of 
the expansion have been laid off at equal distances along the X-azis, 
and the “chances” of the occurrence of each combination of H’s and 
T’s are plotted as frequencies on the Y-azis. The result is a sym- 
metrical frequency polygon with the greatest concentration in the 
center and the “scores” falling away by corresponding decrements 
above and below the central high point. Figure 21 represents the 
results to be expected theoretically when ten coins are tossed 1024 
times, 

Many experiments have been conducted in which coins were 
tossed or dice thrown a great many times, with the idea of checking, 
theoretical against actual results. In one well-known experiment,* 
twelve dice were thrown 4096 times. Each four-, five-, and six-spot 


PRREPIRRR Ig 


* Weldon's experiment; see Yule, G. U., An Introduction to the Theory of 
Statistics (London: C. Griffin and Co., 1932), 10th ed., p. 258. 
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FIG. 21 Probability surface obtained from the expansion of (H + Т):° 


combination was taken as a “success” and each one-, two-, and three- 
spot combination as a "failure." Hence the probability of success 
and the probability of failure were the same. In a throw showing 
the faces 3, 1, 2, 6, 4, 6, 3, 4, 1, 5, 2, and 3, there would be five suc- 
cesses and seven failures. The observed frequency of the different 
numbers of successes and the theoretical outcomes obtained from the 
expansion of the binomial expression (p + 9) !° have been plotted on 
the same axes in Figure 22. The student will note that the observed 


es 
2 


0 У 1 2 з 4 5 6 1. 8 94 210 15.712 
Theoretical Curve -----Асіпа! Curve 


FIG. 22 Comparison of observed and theoretical results in throwing 
twelve dice 4096 times 


(After Yule.) 
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frequencies correspond quite closely to the theoretical except for а 
tendency to shift slightly to the right. If, as an experiment, the 
reader will toss ten coins 1024 times his results will be in close agree- 
ment with the theoretical outcomes shown in Figure 21. 

Throughout the discussion in this section, we have taken the prob- 
ability of occurrence (e.g., Н) and the probability of non-occurrence 
(non-H or T) of a given factor to be the same. This is not a neces- 
sary condition, however. For instance, the probability of an event's 
happening may be only 1/5; of its not happening, 4/5. Any probabil- 
ity ratio is possible as long as (р-- 9) = 1.00. But distributions 
obtained from the expansion of (p +q)” when р is not equal to д are 
"skewed" or asymmetrical and are not normal (p. 116). 


3. Use of probability curve in mental measurement 


The frequency curve plotted in Figure 21 from the expansion of 
the expression (Н + Т)! is a symmetrical many-sided polygon. If 
the number of factors (e.g., coins) determining this polygon were 
increased from 10 to 20, to 30, and then to 100, say (the baseline 
extent remaining the same), the faces of the polygon would increase 
regularly in number. With each increase in the number of factors, 
the faces of the figure would become shorter, and the points on the 
frequency surface would move closer together. Finally, when the 
number of factors became very large—when т in the expression 
(p +q)" became infinite—the polygon would exhibit a perfectly 
smooth surface like that of the curve in Figure 20. This “ideal” 
polygon or "normal" curve represents the frequency of occurrence of 
various combinations of a very Jarge number of equal, similar, and 
independent factors (e.g., coins), when the probability of the appear- 
ance (e.g, Н) or non-appearance (e.g., T) of each factor is the 
same. 

If we compare the four graphs plotted from measures of height, 
intelligence, memory span, and barometric readings in Figure 19, 
with the normal probability curve in Figure 20, the similarity of 
these diagrams to the normal curve is clearly evident. The resem- 
blance of these and many other distributions to the normal seems to 
express a general tendency of quantitative data to take the symmetri- 
cal, bell-shaped form. This general tendency may be stated in the 
form of a “principle” as follows: measurements of many natural 
phenomena and of many mental and social traits under certain con- 
ditions tend to be distributed symmetrically about their means in 


THE NORMAL PROBABILITY CURVE • 93 


proportions which approximate those of the normal probability 
distribution, 

Much evidence has accumulated to show that the normal distribu- 
tion serves to describe the frequency of occurrence of many variable 
facts with a relatively high degree of accuracy. Various phenomena 
which follow the normal probability curve (at least approximately) 
may be classified as follows: 


1. Biological statistics: the proportion of male to female births 
for the same country or community over a period of years; the pro- 
portion of different types of plants and animals in cross-fertilization 
(the Mendelian ratios). 

2. Anthropometrical data: height, weight, cephalic index, etec., for 
large groups of the same age and sex. 

3. Social and economic data: rates of birth, marriage, or death 
under certain constant conditions; wages and output of large num- 
bers of workers in the same occupation under comparable conditions. 

4. Psychological measurements: intelligence as measured by 
standard tests; speed of association, perception-span, reaction-time; 
educational test scores, e.g., in spelling, arithmetic, reading. 

5. Errors of observation: measures of height, speed of movement, 
linear magnitudes, physieal and mental traits, and the like, contain 
errors which are as likely to cause them to deviate above as below 
their true values. Chance errors of this sort vary in magnitude and 
sign and occur in frequencies which follow closely the normal prob- 
ability curve.* 


It is an interesting speeulation that many frequency distributions 
of scores and other measures are sintilar to those obtained by tossing 
coins or throwing dice because the former, like the latter, are actually 
probability distributions. Тһе symmetrical normal distribution, as 

. we have seen, represents the probability of occurrence of the various 
possible combinations of a great many factors (e.g. coins). In a 
normal distribution all of the п factors are taken to be similar, inde- 
pendent, and equal in strength ; and the probability that each will be 
present (e.g. show an Н) or absent (e.g, show a T) is the same. 
Тһе appearance on a coin of a head or a tail is undoubtedly deter- 
mined by a large number of small (or *chance") influences as liable 
to work one way as another. The twist with which the coin is spun 
may be important, as well as the height from which it is thrown, the 
weight of the coin, the kind of surface upon which it falls, and many 

.* This topic is treated in Chapter 8. 
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other circumstances of a like sort. By analogy, the presence or ab- 
sence of each one of the large number of genetic factors which deter- 
mine the shape of a man’s head, or his intelligence, or his personality, 
may depend upon a host of adventitious influences whose net effect 
we call “chance.” 

But the striking similarity of obtained and probability distribu- 
tions should not lead us to conclude that all distributions of mental 
and physical traits which exhibit the bell-shaped form have neces- 
sarily arisen through the operation of those principles which govern 
the appearance of dice or coin combinations. The factors which 
determine musical ability, let us say, or mechanical skill are too little 
known to justify the assumption, a priori, that they combine in the 
Same proportions as do the head and tail combinations in “chance” 
distributions of coins, Moreover, the psychologist usually constructs 
his tests with the normal hypothesis definitely in mind. The result- 
ing symmetrical distribution is to be taken, then, as evidence of the 
Success of his efforts rather than as conclusive proof of the “normal- 
ity” of the trait being measured.* 

The selection of the normal rather than some other type curve is 
sufficiently warranted by the fact that this distribution generally does 
fit the data better, and is more useful. But the “theoretical justifica- 
tion and the empirical use of the normal curve are two quite different 
matters,” + 


II. Properties of the Normal Probability Distribution 


1. The equation of the normal curve 


The equation of the normal probability curve reads 


22 


ет (18) 


Ve N 
сүл 
(equation of the normal probability curve) 

in which 
2 = scores (expressed as deviations from the mean) laid off along 
the baseline or X-axis. 
*MeNemar, Q., The Revision of the Stanford-Binet Scale (Boston: 
Houghton Mifflin Со., 1942), Chapter П. 


үле, D. C., A First Course in Statistics (London: С. Bell and Sons, 1921), 
р. А 
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y = the height of the curve above the X-azis, i.e., the frequency of a 
given z-value or the number achieving a certain score. 


The other terms in the equation are constants:— 


N — number of cases. 

с = standard deviation of the distribution. 

т = 3.1416 (the ratio of the circumference of a circle to its di- 
ameter). 

€ = 2.7183 (base of the Napierian system of logarithms). 


When N and c are known, it is possible from equation (18) to 
compute (1) the frequency (or y) of a given value z, i.e., the number 
of individuals making a certain score; and (2) the number, or per- 
centage, of individuals scoring between two points, or above or below 
а given point in the distribution. But these calculations are rarely 
necessary, as tables are available from which this information may 
be readily obtained. A knowledge of these tables (Table A, p. 424) 
is extremely valuable in the solution of a number of problems. For 
this reason it is very desirable that the construction and use of 
Table A be clearly understood. 


х2. Table of areas under the normal curve 


Table A gives the fractional parts of the total area under the 
normal curve found between the mean and ordinates (1/8) erected at 
various distances from the mean. In Table A distances along the 
X-axis are measured in о units (see Fig. 20). The total area under 
the curve (the number of scores in the distribution) is taken arbi- 
trarily to be 10,000, because of the greater ease with which fractional 
parts of the total area may then be calculated. 

The first column of the table, т/б, gives distances in tenths of с 
measured off on the baseline of the normal curve from the mean as 
origin. We have already learned that z — X — M, i.e., that т meas- 
ures the deviation of a score X from M. If v is divided by о, deviation 
from the mean is expressed in g-units. Such o-deviation scores are 
often called standard scores, or z-scores (z = z/c). Distances from 
the mean in hundredths of c are given by the headings of the columns. 
То find the number of cases in a normal distribution between the 
mean and the ordinate erected at a distance of 16 from the mean, go 
down the z/o column until 1.0 is reached, and in the next column 
under .00 take the entry opposite 1.0, viz., 3413. This figure means 
that 3413 cases in 10,000, or 34.13% of the entire area of the curve, lie 
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between the mean and Io. Put more exactly, 34.1376 of the cases in 
& normal distribution fall within the area bounded by the baseline of 
the curve, the ordinate erected at the mean, the ordinate erected at 
a distance of 16 from the mean, and the curve itself (see Fig. 20, 
р. 87). To find the percentage of the distribution between the mean 
and 1.576, say, go down the 2/6 column to 1.5, then across horizon- 
tally to the column headed -07, and take the entry 4418. This means 
that in a normal distribution, 44.18% of the area (N) lie between 
the mean and 1.570. 

We have so far considered only o-distances measured in the posi- 
tive direction from the mean; that is, we have taken account only of 
the right half—the high-score end—of the normal curve. Since the 


mean and +1o are 68.26% of the cases in à normal distribution 
(see also Fig. 20). 
While the normal curve does not actually meet the baseline until 


10,000, or 99.73% of the entire distribution, lie within the limits -36 
and +30. Ву cutting off the curve at these two points, therefore, we 
disregard only .27 of 1% of the distribution, a negligible amount 
except in very large samples. 


3. Relationships among the constants of the normal probability curve 


In the normal probability curve, the mean, the median, and the 
mode all fall exactly at the midpoint of the distribution and are 
numerically equal. Since the normal curve is bilaterally symmetri- 
cal, all of the measures of central tendency must coincide at the 
center of the distribution, 
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The measures of variability include certain constant fractions of 
the total area of the normal curve, which may be read from Table A. 
Between the mean and +1o lie the middle two-thirds (approxi- 
mately) of the cases in the normal distribution. Between the mean 
and +20 are found 95% (approximately) of the distribution; and 
between the mean and 2-36 are found 99.7% (approximately 100%) 
of the distribution. There are 68 chances (approximately) in 100 
that a score will lie within +10 from the mean in the normal distribu- 
tion; there are 95 chances іп 100 that it will lie within +20 from the 
mean; and 99.7 chances in 100 that it will lie within +30 from the 
mean. 

Instead of о the Q may be used as the unit of measurement in 
determining areas within given parts.of the normal curve. In the 
normal curve the © (p. 46) is generally called the probable error or 
РЕ. The relationships between PE and с are given in the following 
equations: 

РЕ- .67456 | 
с = 1.4826 PE 


from which it is seen that o is always about 5096 larger than the 
PE (p. 52). 

By interpolation in Table А we find that --.67456 or +1 PE in- 
clude the 2596 just above and the 2596 just below the mean. This 
part of the normal curve, sometimes called the “middle 50," is impor- 
tant because it is often taken to define the range of “normal” per- 
formance. The upper 25% is considerably better, and the lowest 
25% considerably poorer in performance than the typical middle or 
average group. From Table A we firld also that +2 РЕ (or +1.34900) 
from the mean include 82.2696 of the measures in the normal curve; 
that +3 PE (or +2.02350) include 95.70%; and that +4 РЕ (or 
2-2.6980c) include 99.30%. 


111. Measuring Divergence from Normality 


1. Skewness 


In a frequency polygon or histogram, usually the first thing which | 
strikes the eye is the degree of symmetry in the figure. In the normal 
curve the mean, the median, and the mode all coincide and there is 
perfect balance between the right and left halves of the figure. A 
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distribution is said to be “skewed” when the mean, the median, and 
the mode fall at different points in the distribution, and the balance 
(or center of gravity) is shifted to one side or the other, to right or 
left. It is important to know (1) whether the skewness which often 
occurs in distributions of test scores and other measures represents а 
real divergence from the normal form; or (2) whether such diver- 
gence is the result of chance fluctuations, arising from temporary 
causes, and is not significant of real discrepancy. The degree of dis- 
placement or skewness in a frequency distribution may be deter- 
mined by the formula 


Sk = 3(mean — median) (19) 
с 


(а measure о) skewness in а frequency distribution) 

In a normal distribution the mean equals the median and the skew- 
ness is 0. The more nearly the distribution approaches the normal 
form, the closer together are the mean and the median, and the less 
the skewness. Distributions are said to be skewed negatively, or to 
the left, when the scores are massed at the high end of the scale (the 
right end), and spread out gradually at the low or left end, as shown 


ВУ 
” Mean Median 


FIG. 23 Negative skewness: to the left 


Mb 
Median" | "Mean 


FIG. 24 Positive skewness: to the right 
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in Figure 23. Distributions are skewed positively, or to the right, 
when the scores are massed at the low (the left) end of the scale, and 
spread out gradually, toward the high or right end as shown in 
Figure 24. 

If we apply formula (19) to the distribution of 50 Army Alpha 
scores in Table 1, page 5, —.28 is obtained as а measure of skew- 
ness. This result points to a slight negative skewness in the data, 
which may be seen by reference to Figure 2, page 11. Formula (19) 
gives the measure of skewness for the distribution of the 200 cancel- 
lation scores (Table 3, page 13) as .009. This negligible degree of 
positive skewness shows how closely this distribution approaches the 
symmetrical probability form, 

Another measure of skewness is given by the formula 


Sk = ‘Poe Pooh — Ру (20) 


(a measure of skewness in terms of percentiles) * 


For the normal distribution Sk by formula (20) is zero: Pss lies 
just midway between P, and Ру. 

Applying this formula to the distributions of 50 Army Alpha scores 
and 200 cancellation scores, we obtain for the first Sk = —2.50; and 
for the second Sk = .03. These results are numerically different from 
the measures of skewness obtained from formula (19), because the 
two measures of skewness are computed from different reference 
values in the distribution, and hence are not directly comparable. 
The two formulas agree, however, in indicating some negative skew- 
ness for the distribution of 50 Alpha scores, and an insignificant 
degree of positive skewness for the 200 cancellation scores. In com- 
paring the skewness of two distributions we should use either for- 
mula (19) or (20); not first the one and then the other. 

The important question of how much skewness a distribution must 
exhibit before it may be said to be significantly skewed cannot be 
answered until we have calculated a “standard error” of our measure 
of skewness. A formula for the standard error of Sk, when deter- 
mined by formula (20), and a method of testing whether the skew- 
ness of a given distribution is significant are discussed in Chapter 9, 
page 241, 

* Kelley, T. L., Statistical Method (New York: Macmillan, 1923), p. 77. The 


terms in this formula, as given by Kelley, һауе been reversed so that the sign of 
Sk will agree with the conventional notion of positive and negative skewness. 
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2. Kurtosis 


The term kurtosis refers to the “peakedness” or flatness of a fre- 
quency distribution as compared with the normal, A frequency dis- 
tribution more peaked than the normal is said to be leptokurtic; one 
flatter than the normal, platykurtic. Figure 25 shows a leptokurtic 


«зә 0-26 ole о tle +20 +30 


FIG. 25 Leptokurtic (A), normal ог mesokurtic (В), and platykurtic 
(C) curves 


distribution and a platykurtic distribution plotted on the same dia- 
gram around the same mean. A normal curve (called mesokurtic) 
has also been drawn in on the diagram to bring out the contrast in 
the figures, and to make comparison easier. A formula for measuring 
kurtosis is 


Rie Ша. (21) 
(Poo — Pio) 
(a measure of kurtosis in terms of percentiles) 


For the normal eurve, formula (21) gives Ku = .263.* If Ku is 
greater than .263 the distribution is platykurtie; if less than .263 the 
distribution is leptokurtic. Calculating the kurtosis of the distribu- 
tions of fifty Alpha scores and 200 cancellation scores, discussed 
above, we obtain Ku = .237 for the first distribution, and Ku = 229 


* From Table A, PE(Q) = 67450, Po = 1286, and Р. = —1.280. Hence by 
formula (21) 


3. ТАБ” Өй” 
К-т (Товуу = 263 
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forthe second. Both distributions, therefore, are slightly leptokurtic. 
To determine whether the kurtosis in a distribution is signifieant, 
that is, whether the curve is too high or too flat to be treated as sensi- 
bly normal, we must evaluate Ки in terms of its standard error. А 
formula for the standard error of Ku, and a method of determining 
the significance of an obtained measure of Ku will be given in Chap- 
ter 9, page 242. 


3. Comparing a given histogram or frequency polygon with a normal 
curve of the same area, M and o 


In this section methods will be described for superimposing on a 
given histogram or frequency polygon a normal curve of the same М 5 
M, and c as the actual distribution. Such a normal curve is the “best 
fitting" normal distribution for the given data. The research worker 
often wishes to compare his distribution “Бу eye" with that normal 
curve which “best fits" the data, and such a comparison may profita- 
bly be made even if no measures of divergence from normality are 
computed, In fact, the direction and extent of asymmetry often 
strike us more convincingly when seen in a graph than when ex- 
pressed by measures of skewness and kurtosis. It may be noted that 
а normal curve can always be readily constructed by following the 
procedures given here, provided the area (N ) and variability (о) 
are known. 


TABLE 16 Frequency distribution of the scores made by 206 freshmen 
on the Thorndike Intelligence Examination 


Scores Í 

115-119 1 

110-114 2 

105-109 4 
100-104 10 Mean — 81.59 
95-99 13 Median = 81.00 
90-94 18 о = 12.14 
85-89 34 

80-84 30 

75-79 37 

70-74 27 

65-69 15 

60-64 10 

55-59 2 

50-54 2 

45-49 1 
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Table 16 shows the frequency distribution of seores made on the 
Thorndike Intelligence Examination by 206 college freshmen. Тһе 
mean is 81.59, the median 81.00, and the o 12.14, This frequency 
distribution has been plotted in Figure 26, and over it, on the same 
axes has been drawn in the best-fitting normal eurve, i.e., the normal 
curve which best describes these data, The Thorndike scores are rep- 
resented by a histogram instead of by a frequency polygon in order 
to prevent coincidence of the surface outlines and to bring out more 
clearly agreement and disagreement at different points. To plot a 
normal curve over this histogram, we first compute the height of the 
maximum ordinate (y,) or the frequency at the middle of the dis- 
tribution. The maximum ordinate (Yo) can be determined from the 
equation of the normal curve given on page 94. When тіп this equa- 


tion is put equal to zero (the т at the mean of the normal curve is 0), 


20 N 
the term 62% equals 1.00, and Yo = ne In the present problem, 


N = 206; o = 243 * (in units of class-interval), and 4/27 = 2.51; 
hence y, = 33.8 (see Fig. 26 for caleulations). Knowing y,, we are 


SRESBSBLSS 


- 
to 


FIG. 26 Frequency distribution of the scores of 206 freshmen on the 
Thorndike Intelligence Examination, compared with best-fitting 
normal curve for same data 
(For data, see Table 16.) 
*o = 2.43 X 5 (interval). The o in interval units is used in the equation, 


since the units on the X-azis are in terms of class-intervals. 
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NonMaL Curve ORDINATES AT Mean, +10, +20, +30 


IEN 206 
Yo = VAR 243 X 251 


3x2 = 33.8 
551005 DOSS 0600.8 2.20 


+ 20 = .13584 X 33. 4. 
+ Зо = .01111 X 33.8 . 


5 
6 
4 
able to compute from Table B the heights of ordinates at given dis- 
tances from the mean. Тһе entries in Table В give the heights of the 
ordinates in the normal probability curve, at various o-distances 
from the mean, expressed as fractions of the maximum or middle 
ordinate taken equal to 1.00000. То find, for example, the height of 
the ordinate at 2-16, we take the entry .60653 from the table opposite 
2/0 = 10. This means that when the maximum central ordinate 
(Yo) is 1.00000, the ordinate (ie., frequency) +10 removed from 
M is .60653; or the frequency at +10 is about 61% of the maximum 
frequency at the middle of the distribution. In Figure 26 the ordi- 
nates --16 from M are .60653 X 33.8 (уо) or 20.5 The ordinates 
2-20 from М are .13534 X 33.8 or 4.6; and the ordinates +30 from 
М аге .01111 X 33.8 or 4. 

The normal curve may be sketched in without much difficulty 
through the ordinates at these seven points. Somewhat greater accu- 
тасу шау be obtained if various intermediate ordinates, for exam- 
ple, at +.50, +1.50, ete., are also plotted. The ordinates for the 
curve in Figure 26 at +.5ø are .88250 X 33.8 or 29.3; at +1.50, 
32465 X 33.8 or 11.0, ete. 

From formula (20) the skewness of our distribution of 206 scores 
is found to be 1.25. This small value indicates a low degree of posi- 
tive skewness in the data. The kurtosis of the distribution by for- 
mula (21) is .244, and the distribution appears to be slightly lepto- 
kurtie (this is shown by the “peak” rising above the normal curve). 
Neither measure of divergence, however, is significant of а "real" 
discrepancy between our data and those of the normal distribution 
(see p. 212). On the whole, then, the normal curve plotted in Fig- 
ure 26 fits the obtained distribution well enough to warrant our treat- 
ing these data as sensibly normal. 


IV. Applications of the Normal Probability Curve 
This section will consider a number of problems which may readily 


be solved if one can assume that the distributions of seores may be 
· treated as normal, or at least as approximately normal. Each general 
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problem will be illustrated by several examples. These examples are 
intended to present the issues concretely, and should be carefully 
worked through by the student. Constant reference will be made to 
Table A; and a knowledge of how to use this table is essential. 


1. To determine the percentage of cases in a normal distribution which 
fall within given limits 


Example (1) Given a normal distribution with a mean of 12, and 
а о of 4. (a) What percentage of the cases fall between 8 and 16? 
(b) What percentage of the cases lie above 18? (c) Below 6? 


(a) A score of 16 * is four points above the mean, and a score of 8 
is four points below the mean. If we divide this scale distance of four 
score units by the o of the distribution (1.е., by 4) it is clear that 16 
із 16 above the mean, and that 8 is 16 below the mean (see Fig. 27, 
below). There are 68.26% of the cases in a normal distribution 
between the mean and +1o (Table A). Hence, 68.26% of the scores 


in this distribution, or approximately the middle two-thirds, fall 
between 8 and 16. This result may also be stated in terms of 
"chances." Since 68.26% of the cases in the given distribution fall 
between 8 and 16, the chances are about 68 in 100 that any score in 
the distribution will be found between these points. 
(b) The upper limit of a score of 18, namely, 18.5, is 6.5 score units 
* A score of 16 is the midpoint of the interval 15.5 to 16.5 


THE NORMAL PROBABILITY CURVE + 105 


or 1.625с above the mean (6.5/4 = 1.625). From Table A we find 
that 44.79% of the cases in the entire distribution fall between the 
mean and 1.6256. Accordingly, 5.21% of the cases (50.00 — 44.79) 
must lie above the upper limit of 18 (viz., 18.5) in order to fill out 
the 50% of cases in the upper half of the normal curve (Fig. 27). In 
terms of chances, there are about 5 chances in 100 that any score in 
the distribution will be larger than 18. 

(c) The lower limit of a score of 6, namely 5.5, is — 1.6256 from the 
mean. Between the mean and 5.5 (—1.6256) are 44.7996 of the cases 
in the whole distribution. Hence, about 576 of the cases in the dis- 
tribution lie below 5.5—fill out the 50% below the mean—and the 
chances are about 5 in 100 that any score in the distribution will be 
less than 6, i.e., below the lower limit of score 6. 

Example (2) Given a normal distribution with a mean of 29.75 

and a o of 6.75, What percentage of the distribution will lie be- 

\ tween 22 and 26? What are the chances that a score will be be- 
tween these two points? 


A score of 22 * is 7.75 score units or —1.15с (7.75/6.75 = 1.15) 
from the mean; and a score of 26 is 3.75 or —.566 from the mean 


(Fig. 28, above). We know from Table A that 37.49% of the cases 
in a normal distribution lie between the mean and —1.15c ; and that 
21.23% of the cases lie between the mean and —.56c. By simple sub- 
traction, therefore, 16.26% of the cases fall between —1.15 and 
* A score of 22 is the midpoint of the interval 21.5 — 22.5. : 
n 
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—.566 or between the scores 22 and 26, The chances are 16 in 100 
that any score in the distribution will lie between these two points, 


2. To find the limits in any normal distribution which will include a given 
percentage of the cases 


Example (1) Given a normal distribution with a mean of 16.00 
and a o of 4.00. What limits will include the middle 75% of the 
cases? 


The middle 75% of the cases in a normal distribution must include 
the 37.5% just above, and the 37.5% just below the mean. From 
Table A we find that 3749 cases in 10,000, or 37.5% of the distribu- 
tion, fall between the mean and 1.156; and, of course, 37.5% of the 
distribution also fall between the mean and —1.15c. The middle 
75% of the cases, therefore, lie between the mean and 1.150; or, 
since о = 4.00, between the mean and +4.60 score units. Adding 
4.60 to the mean (to 16.00), we find that the middle 75% of the 
scores in the given distribution lie between 20.60 and 11.40 (see Fig. 
29, below). 


16.00 20.60 
4 = 4,00 


FIG. 29 


140 


Example (2) Given a normal distribution with a median of 
150.00 and a PE(Q) of 17. What limits will inelude the highest 
20% of the distribution? the lowest 10%? 


We know from page 97 that с = 1.4826 PE; hence the c of this 
distribution is 25.20 (1.4826 X17). The highest 20% of а normally 
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distributed group will have 3095 of the cases between its lower limit 
and the median, since 50% of the cases lie in the right half of the 
distribution. From Table A we know that 2995 cases in 10,000 or 
80% of the distribution are between the median and .846. Since the c 
of the given distribution is 25.20, .84с will be .84 X 25.20 or 21.17 
score units above the median, or at 171.17. The lower limit of the 
highest 2095 of the given group, therefore, is 171.17; and the 
upper limit is the highest score in the distribution, whatever that 
may be. 

The lowest 10% of a normally distributed group will have 4096 of 
the cases between the median and its upper limit. Almost exactly 
4076 of the distribution fall between the median and — 1.280. Hence, 
Since o = 25.20, —1.280 must lie at —1,28 X 25.20 or 32.26 score 
units below the median, that is, at 117.74. The upper limit of the 
lowest 10% of scores іп the group, accordingly, is 117.74; and the 
lower limit is the lowest score in the distribution. 


3. To compare two distributions in terms of "overlapping" 


Example (1) Given the distributions of the scores made ona 
logical memory test by 300 boys and 250 girls (Table 17). The 
boys’ mean score is 21.49 with a o of 3.63. The girls’ mean score 
is 23.68 with a o of 5.12. The medians are: boys, 21.41, and girls, 
23.66. What percentage of boys exceed the median of the girls’ 
distribution? 


On the assumption that these distributions are sensibly normal, we 
may solve this problem by means of Table A. The girls’ median is 
23.66 — 21.49 or 2.17 score units above the boys’ mean. Dividing 
2.17 by 3.63 (the o of the boys’ distribution), we find that the girls’ 
median is .60с above the mean of the boys’ distribution. Table A 
shows that 23% of a normal distribution lie between the mean and 
(606; hence 27% of the boys (50% — 23%) exceed the girls’ 
median. 

This problem may also be solved by direct calculation from the 
distributions of boys’ and girls’ scores without any assumption as to 
normality of distribution. The calculations are shown in Table 17; 
and it will be interesting to compare the result found by direct calcu- 
lation with that obtained by use of the probability tables. The prob- 
lem is to find the number of boys whose scores exceed 23.66, the бігіз” 
median, and then turn this number into а percentage. There are 217 
boys who score up to 23.5 (lower limit of 23.5 to 27.5). The class- 
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interval 23.5 to 27.5 contains 68 scores; hence there are 68/4 or 17 
scores per scale unit on this interval. We wish to reach 23.66 in the 
boys' distribution. This point is .16 of a score (23.66 — 23.50 — .16) 
above 23.5, or 2.72 (1.е., 17 X 16) score units above 23.5. Adding 
2.72 to 217, we find that 219.72 of the boys' scores fall below 23.66, 
the girls’ median. Since 300 — 219.72 = 80.28, it is clear that 


TABLE 17 To illustrate the method of determining overlapping by direct 
calculation from the distribution 


Boys Girls 
Scores Í Scores f 
27.5 to 31.5 15 31.5 to 35.5 20 
23.5 to 27.5 68 27.5 to 31.5 35 
19.5 to 23.5 128 23.5 to 27.5 73 
15.5 to 19.5 79 19.5 to 23.5 68 
11.5 to 15.5 10 15.5 to 19.5 41 
N = 300 11.5 to 15.5 13 
N/2 = 150 N = 250 
N/2 = 125 
Мап = 19.5 + s x 4 Мап = 23.5 + A X 4 
= 21.41 = 23.66 
M = 21.49 M = 23.68 
€ = 3.63 с = 5.12 


What percent of the boys exceed 23.66, the median of the girls? First, 
217 boys make scores below 23.5. The class-interval 23.5-27.5 contains 68 
Scores; hence, there are 68/4 or 17 scores per scale unit on this interval. 


27.5. If we multiply 17 (number of scores per scale unit) b: -16 we obtain 
2.72 which is the ems Dis 52 .66. 

o V and 2.72, we obtain 219.72 as that art of the boys' distri- 
bution w К i ра N is 300; hence 
800-219.72 gives 80.28 as that E of the boys' distribution which lies 
above 23.66.” Dividing 80.28 by 300, we find that -2676, or approximately 
2796, of the boys exceed the girls’ median, 


80.28 = 300 or 26.76% (approximately 27%) of the boys exceed the 
girls’ median. This result is in almost perfect agreement with that 
obtained above. Apparently the assumption of normality of distri- 
bution for the boys’ scores was justified. 

The agreement between the percentage of overlapping found by 
direct calculation from the distribution and that found by use of the 
probability tables will nearly always be close, especially if the groups 
are large and the distributions fairly symmetrical. When the over- 
lapping distributions are small and not very regular in outline, it is 
safer to use the method of direct calculation, since no assumption as 
to form of distribution is then made. 
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4. To determine the relative difficulty of test questions, problems, and 
other test items 


Example (1) Given a test question or problem solved by 10% of 
а large unselected group; a second problem solved by 20% of the 
same group; and a third problem solved by 30%. If we assume the 
capacity measured by the test problems to be distributed normally, 
what is the relative difficulty of questions 1, 2, and 3? 


Our first task is to find for Question 1 a position in the distribution, 
such that 10% of the entire group (the percent passing) lie above, 
and 90% (the percent failing) lie below the given point. The highest 
10% in a normally distributed group has 40% of the cases between 


its lower limit and the mean (see Fig. 30, above). From Table А we 
find that 39.97% (і.е., 40%) of a normal distribution fall between 
the mean and 1.28c. Hence, Question 1 belongs at a point on the 
baseline of the curve, a distance of 1.286 from the mean; and, accord- 
ingly, 1.280 шау be set down as the difficulty value of this question. 

Question 2, passed by 20% of the group, falls at a point in the dis- 
tribution 30% above the mean. From Table A it is found that 
29.95% (і.е., 30%) of the group fall between the mean and 840; 
hence, Question 2 has a difficulty value of .840. Question 3, which 
lies at a point in the distribution 20% above the mean, has a difficulty 
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value of .526, since 19.8576 of the distribution fall between the mean 
and .520. To summarize our results: 


Question Passed by o-value o-difference 
1 10% 1.28 — 
2 20% 84 44 
3 80% 52 82 


Тһе o-difference in difficulty between Questions 2 and 3 is 32, which 
is roughly 3/4 of the o-difference in difficulty between Questions 1 
and 2. Since the percentage difference is the same in the two compari- 
sons, it is evident that when ability is assumed to follow the normal 
distribution, с and not percentage differences are the better indices 
of differences in difficulty. 


Example (2) Given three test items, 1, 2, and 3, passed by 50%, 
40%, and 30%, respectively, of a large group. On the assumption 
of normality of distribution, what percentage of this group must 
pass test item 4, in order for it to be as much more difficult than 
8, as 2 is more difficult than 1? 


An item passed by 50% of a group is, of course, failed by 50%; 
and, accordingly, such an item falls exactly in the middle of a normal 
distribution of “difficulty.” Test item 1, therefore, has a o-value of 
-00, since it falls exactly at the mean (Fig. 31). Test item 2 lies at a 


+250 .520 7772 
FIG. 31 


| 
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point іп the distribution 10% above the mean, since 40% of the 
group passed and 60% failed this item. Accordingly, the o-value of 
item 2 is .25, since from Table A we find that 9.87% (roughly 10%) 
of the cases lie between the mean and .25c. Test item 3, passed by 
80% of the group, lies at a point 20% above the mean, and this item 
has a difficulty value of :526, as 19.85% (20%) of the normal distri- 
bution fall between the mean and .520. 

Since item 2 is .256 farther along on the difficulty scale (toward 
the high-score end of the curve) than item 1, it is clear that item 4 
must be .250 above item 3, if it is to be as much harder than item 3 
as item 2 is harder than item 1. Item 4, therefore, must have а value 
of .520 + .250 or 770; and from Table A we find that 27.94% (28%) 
of the distribution fall between the mean and this point. This means 
that 50% — 28% or 22% of the group must pass item 4, То sum- 
marize: 


Test Item Passed by o-value o-difference 
1 50% 00 - 
2 40% 25 25 
8 80% 52 — 
4 22% 177 25 


А test item, therefore, must be passed by 22% of the group in order 
for it to be as much more difficult than an item passed by 30%, as an 
item passed by 40% is more difficult than one passed by 50%. Note 
again that percentage differences are not reliable indices of differ- 
ences in difficulty when the capacity measured is distributed 
normally, 


5. To separate a given group into sub-groups according to capacity, 
when the trait is normally distributed 


Example (1) Suppose that we have administered a certain ex- 
amination to 100 college students. We wish to classify our group 
into five sub-groups A, B, C, D, and E according to ability, the 
range of ability to be equal in each sub-group. On the assumption 
that the trait measured by our examination is normally dis- 
tributed, how many students should be placed in groups A, B, C, 
D, and E? 


Let us first represent the positions of the five sub-groups diagram- 
matically on a normal curve as shown in Figure 32, below. If the 


112 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


baseline of the curve is considered to extend from —86 to +36, that 
is, over a range of бо, dividing this range by 5 (the number of sub- 
groups) gives 1.26 as the baseline extent to be allotted to each group. 
These five intervals may be laid off on the baseline as shown in the 
figure, and perpendiculars erected to demarcate the various sub- 
groups. Group A covers the upper 1.26; group В the next 1.20; 
group C lies .бс to the right and .бо to the left of the mean; groups 
D and E occupy the same relative positions in the lower half of the 
curve that В and A occupy in the upper half. 


FIG, 32 


To find what percentage of the whole group belongs in A we must 
find what percentage of a normal distribution lies between Зс (upper 
limit of the A group) and 1.86 (lower limit of the А group). From 
Table A 49.86% of a normal distribution is found to lie between the 
mean and Зо; and 46.41% between the mean and 1.80. Hence, 3.5% 
of the total area under the normal curve (49.86% — 46.41%) lie 
between 36 and 1.86; and, accordingly, group А comprises 3.596 of 
the whole group. 

The percentages in the other groups are calculated іп the same 
way. Thus, 46.41% of the normal distribution fall between the mean 
and 1.86 (upper limit of group B) and 22.57% fall between the 
mean and .бо (lower limit of group B). Subtracting, we find that 
46.4176 — 22.57% ог 23.84% of our distribution belong in sub- 
group B. Group C lies from .бо above to —.60 below the mean. 
Between the mean and .66 are 22.57% of the normal distribution, and 
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the same percent lies between the mean and —.6o. Group C, there- 
fore, includes 45.14% (22:57 X 2) of the distribution. Finally, sub- 
group D, which lies between —.6¢ and —1.86, contains exaetly the 
same percentage of the distribution as sub-group B; and group E, 
which lies between —1.8o and —3o, contains the same percent of the 
whole distribution as group A. The percentage and number of men 
in each group are given in the following table: 


Groups 
A B с р Е 
Percent of total іп each group 3.5 238 45 238 3.5 
Number in each group 4or3 24 45 24 4or3 


(100 men in all) 


On the assumption that the capacity measured follows the normal 
curve, it is clear that three to four men in our group of 100 should be 
placed in group A, the “marked” ability group; twenty-four in 
group B, the "high average" ability group; forty-five in group C, 
the "average" ability group; twenty-four in group D, the “low aver- 
age" ability group; and three or four in group E, the "very low” or 
"inferior" group. 

The above procedure may be used to determine how many students 
in a class should be assigned to each of any given number of grade- 
groups. It must be remembered that the assumption is made that 
performance in the subject matter upon which the individuals are 
being marked is represented by the normal curve. The larger and 
more unselected the group the more nearly is this assumption 
justified. 


D 


V. Why Frequency Distributions Deviate from the 
Normal Form 


It is often important for the research Worker to know why his dis- 
tributions diverge from the normal form, and this is especially true 
when the deviation from normality is large and significant (p. 212). 
The reasons why distributions exhibit Skewness and kurtosis are 
numerous and often complex, but a careful analysis of the data will 
often permit the setting up of hypotheses concerning non-normality 
which may be tested experimentally. Common causes of asymmetry, 
All of whieh must be taken into consideration by the careful experi- 
menter, will be summarized in the present section. 
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1. Unrepresentative or biased sampling 


Selection is a potent cause of skewness. We should hardly expect 
the distribution of I.Q.'s obtained from a group of twenty-five ten- 
year-old boys (all superior students) to be normal; nor would we 
look for symmetry in the distribution of 1.Q.’s got from a special 
class of dull-normal ten-year-old boys, even though the group were 
fairly large. Neither of these groups is an unbiased selection (ie. а 
cross-section) from the population of ten-year-old boys; and in addi- 
tion, the first group is quite small. A small sample is not necessarily 
unrepresentative, but more often than not it is apt to be. 

Selection will produce skewness and kurtosis in distributions even 
when the test has been adequately constructed and carefully admin- 
istered. For example, a group of elementary school pupils which con- 
tains (a) a large proportion of bilinguals, (5) many children of very 
low or very high socio-economic status, (c) a large number of pupils 
over-age for grade or accelerated, will almost surely return skewed 
distributions of test scores even upon standard intelligence and edu- 
cational achievement examinations, 

Scores made by small and homogeneous groups are likely to yield 
leptokurtic distributions; while scores from large and heterogeneous 
groups are more likely to be platykurtic. The distribution of scores 
achieved upon an educational examination by pupils throughout the 
elementary grades, as well as the distribution of chronological ages 
for these same pupils, will probably be somewhat flattened owing to 
the considerable overlap from grade to grade. 

Distributions of physical traits, such as height, weight, and 
strength, are also affected by selection. Measurements of physical 
traits in large groups of the same age, sex, and race will closely 
approximate the normal form (р. 85). But the distribution of height 
for fourteen-year-old girls in the high school of a small city, or the 
distribution of weight for freshmen in а midwestern college, will prob- 
ably be skewed, as these groups are subject to selection in various 
traits related to height and weight. 


2. Use of unsuitable or poorly made tests 


If a test is too easy, scores will pile up at the high-score end of the 
distribution, while if the test is too hard scores will pile up at the low- 
score end. Imagine, for example, that an examination in arithmetic 
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which requires only addition, subtraction, multiplication, and divi- 
sion, has been given to 1000 seventh graders. The resulting distribu- 
tion will almost certainly be badly skewed to the left (see Fig. 23). 
On the other hand, if the examination contains only problems in 
complex fractions, interest, square root, and the like, the score dis- 
tribution is likely to be positively skewed—low scores will be more 
numerous than intermediate or high scores. It is probable also that 
both distributions will be somewhat more “peaked” (leptokurtic) 
than the normal. 

Asymmetry in cases like these may be explained in terms of those 
small positive and negative factors which determine the normal dis- 
tribution. Too easy a test excludes from operation some of the factors 
which would make for an extension of the curve at the upper end, 
such as knowledge of more advanced arithmetical processes which 
the brighter child would know. Too hard a test excludes from opera- 
tion factors which make for the extension of the distribution at the 
low end, such as knowledge of those very simple facts which would 
have permitted the answering of a few at least of the easier questions 
had these been included. In the first case we have a number of per- 
fect scores and little discrimination; in the second case a number of 
zero scores and equally poor differentiation. Besides the matter of 
difficulty in the test, asymmetry may be brought about by ambigu- 
ous or poorly made items and by other technical faults.* 


3. The measurement of traits the distributions of which are not normal 


Skewness or kurtosis or both may also appear owing to a real lack 
of normality in the trait being measured. Non-normality of dis- 
tribution will arise, for instance, when some of the hypothetical fac- 
tors determining performance in a trait are dominant or prepotent 
over the others, and hence are present, more often than chance will 
allow. Illustrations may be found in distributions resulting from the 
throwing of loaded dice. When off-center or biased dice are cast the 


* Hawkes, Lindquist and Mann, The Construction and Use of Achievement 
Examinations (Boston: Houghton Mifflin Co. 1936), Chapters II and ІП, 
There is no reason why all distributions should approach the normal form. 
Thorndike has written: “There is nothing arbitrary or mysterious about vari- 
ability which makes the so-called normal type of distribution a necessity, or 
any more rational than any other sort, or even more to be expected on 
@ priori grounds. Nature does not abhor irregular distributions."—Theory of 
а and Social Measurement (New York: Teachers College, 1913), pp. 
9, 
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resulting distribution will certainly be skewed and probably peaked, 
owing to the greater likelihood of combinations of faces yielding 
extreme scores. The same is true of biased coins. Suppose, for 
example, that the probability of “success” (appearance of H) is four 
times the probability of failure (non-occurrence of H, or presence of 
T), so that p = 4/5, q = 1/5, and (p +q) = 1.00. If we think of the 
factors making for success or failure as 3 in number, we may expand 
(p--q)* to find the incidence of success and failure in varying 
degree. Thus, (p-+ 4)? = p? + 3p*q + 3pg? + д, and substituting 
p = 4/5 and q = 1/5, we have 


(1) P= (4/5; = мы (2) Expressed as a frequency 
125 УДА ен 
48 distribution: 
Зрч = 3(4/5)- (1/8) = 125 “Successes” f 
12 3 64 
3pg* = 3(4/5)-(1/5)? = 125 2 48 
1 1 12 
Ф- (1/5) =a 0 ul 
e N = 125 


Тһе numerators of the probability ratios (frequency of success) may 
be plotted in the form of a histogram to give Figure 33. 


^ DUE n 
FIG. 33 Histogram of the ехрап- НО. 34 U-shaped frequency 
sion (p + gj, where p = #, q— à. curve 


р is the probability of success, 4 
the probability of failure 


Note that this distribution is negatively skewed (to the left) ; that 
the incidence of three "successes" is 64, of two 48, of one 12, and of 
none 1. J-shaped distributions like these are essentially non-normal. 
Such curves have been most often found by psychologists to describe 
certain forms of social behavior. For example, suppose that we tabu- 
late the number of students who appear at a lecture “оп time"; and 
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the number who come in five, ten, and fifteen-plus minutes late. If 
frequency of arrival is plotted against time, the distribution will be 
highest at zero (“оп time") on the Y-azis and will fall off rapidly as 
we go to the right, i.e., will be positively skewed and J-shaped (see 
Fig. 24). If only the early-comers are tallied, up to the *on time" 
group, the curve will be negatively skewed like those in Figures 23 
and 33. J-curves describe behavior which is essentially non-normal 
in occurrence because the causes of the behavior differ greatly in 
strength. But J-curves may also represent frequency distributions 


‚ badly skewed for other reasons. We have seen in (1) and (2) above 


that selection and poorly chosen tests can produce distributions 
which closely resemble J-curves, 

Skewed curves often occur in medical statistics, The frequency of 
death due to degenerative disease, for instance, is highest during 
maturity and old age and minimal during the early years. If age is 
laid off on the baseline and frequency of death plotted on the Y-axis 
the curve will be negatively skewed and will resemble Figure 23 
closely. Factors making for death are prepotent over those making 
for survival as age increases, and hence the curve is essentially asym- 
metrical. In the case of a childhood disease, the occurrence of death 
will be positively skewed when plotted against age as the probability 
of death becomes less with increase in age. 

Another type of non-normal distribution, which may be briefly 
described, is the U-shaped curve shown in Figure 34. U-shaped dis- 
tributions, like J-curves, are rarely encountered in mental and physi- 
cal measurement. They are sometimes found in the measurement of 
social and personality traits, if the group is extremely heterogeneous 
with respect to some attribute, or if*the test measures a trait that is 
likely to be present or absent in an all-or-none manner. Thus, in a 
group composed about equally of normals and mentally ill persons, 
the normals will tend to make low scores on a Neurotic Inventory 
while the abnormals will tend to make high scores—with considera- 
ble overlapping, of course. Again, in tests of suggestibility, if a 
subject yields to suggestion in the first trial he is likely to be sug- 
gestible in all trials—thus earning a high score. On the other hand, 
if he resists suggestion on the first trial, he is likely to resist in all 
subsequent trials—thus earning a zero (or a very low) score.* This 
all-or-none feature of the score makes for a U-shaped distribution. 


* See Hull, C. L., Hypnosis and Suggestibility (New York: Appleton-Century- 
Crofts, 1938), p. 68. 
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4. The influence upon distribution form of errors made in the construction 
and administration of tests 


Other factors besides those already mentioned make for distortions 
in score distributions. Differences in the size of the units in which а 
trait has been measured, for example, will lead to skewness. Thus, if 
the test items are very easy at the beginning and very hard later on, 
an increment of one point of score at the upper end of the test scale 
will be much greater than an increment of one point at the low end 
of the scale. The effect of such unequal or “rubbery” units is the 
same as that encountered when the test is too easy—scores tend to 
pile up toward the high end of the scale and be stretched out or 
skewed toward the low end. 

Errors in administration of a test as in timing or giving instruc- 
tions; errors in the use of scoring stencils; large differences in ргас- 
tice or in motivation among the subjects—all of these factors, if they 
cause many students to score higher or lower than they normally 
would, will make for skewness in the distribution. 


PROBLEMS 


1. In two throws of a coin, what is the probability of throwing at least one 
head? 


2. What is the probability of throwing exactly one head in three throws 
of a coin? 


3. Five coins are thrown. What is the probability that exactly two of them 
will be heads? 


4. А box contains 10 red, 20 white and 30 blue marbles. After a thorough 
shaking, a blindfolded person draws out 1 marble. What is the prob- 
ability that 
(a) it is blue? 

(b) red or blue? 
(c) neither red nor blue? 


5. If the probability of answering a certain question correctly is four 
times the probability of answering it incorrectly, what is the probability 
of answering it correctly? 


6. (a) If two unbiased dice are thrown what is the probability that the 
number of spots showing will total 7? 
(b)-Draw up a frequency distribution showing the occurrence of com- 
binations of from 2 to 12 spots when two dice are thrown. 


10. 


11. 


13. 


14. 


15. 
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(a) In an attitude questionnaire containing 10 statements, each to be 
marked as True or False, what is the probability of getting а per- 
fect score by sheer guesswork? 

(6) Suppose you know 5 statements to be True and 5 False. What is 
the probability that you will mark the right ones True (select the 
right five) ? 


A rat has five choices to make of alternate routes in order to reach the 
food-box. If it is true that for each choice the odds are two to one in 
favor of the correct pathway, what is the probability that the таб will 
make all of its choices correctly? 


Assuming that trait X is completely determined by 6 factors—all simi- 
lar and independent, and each as likely to be present as absent—plot the 
distribution which one might expect to get from the measurement of 
trait X in an unselected group of 1000 people. 


Toss five pennies thirty-two times, and record the number of heads and 
tails after each throw. Plot frequency polygons of obtained and ex- 
pected occurrences on the same axes. Compare the M's and o's of ob- 
tained and expected distributions. 


What percentage of a normal distribution is included between the 


(a) mean and 1.546 (d) —3.5PE and 1.0PE 
(b) mean and —2.7PE (e) .666 and 1.786 
(с) —1.73с and .566 (f) -18РЕ and —2.5PE 


- In a normal distribution 


(a) Determine P»;, Рав, P54, and Ps; in g-units. 
(b) What are the percentile ranks of scores at —1.230, —.50, +-.846? 


(a) Compute measures of skewness and of kurtosis for the first two 
frequency distributions in Chapter 2, Problem 1, page 40. 

(b) Fit normal probability eurves'to these same distributions, using 
the method given on page 102. 

(c) For each distribution, compare the percentage of cases lying be- 
tween 2516 with the 68.26% found in the normal distribution. 


Suppose that the height of the maximum ordinate (y,) in а normal 
curve is 50. What is the height to the nearest integer of the ordinate 
at the г/с point which cuts off the top 11% of the distribution? top 
80%? bottom 5%? (Use Tables А and B.) 


In a sample of 1000 cases the mean of a certain test is 14.40 and o is 

2.50. Assuming normality of distribution 

(a) How many individuals score between 12 and 16? 

(b) How many score above 18? below 8? 

(c) What are the chances that any individual selected at random will 
score above 15? 
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16. 


17. 


18. 


19. 


21. 


22. 


In the Army General Classification Test the distribution is essentially 
normal with a M = 100 and SD = 20. 

(a) What percent of scores lie between 85 and 125? 

(b) Тһе middle 60%, fall between what two scores? 

(c) On what score does Q; fall? 


In a certain achievement test, the seventh-grade mean is 28.00 and 
SD is 4.80; and the eighth-grade mean is 31.60 and SD is 4.00. What 
percent of the seventh grade is above the mean of the eighth grade? 
What percent of the eighth grade is below the mean of the seventh 
grade? 


Two years ago a group of twelve-year-olds had a reading ability ex- 
pressed by a mean score of 40.00 and a с of 3.60; and a composition 
ability expressed by a mean of 62.00 and a о of 9.60. Today the group 
has gained 12 points in reading and 10.8 points in composition. How 
many times greater is the gain in reading than the gain in composition? 


Іп Problem 1, Chapter 4, we computed directly from the distribution 
the percent of Group A which exceeds the median of Group B. Com- 
pare this value with the percentage of overlapping obtained on the 
assumption of normality in Group A. 


. Four problems, A, B, C, and D, have been solved by 50%, 60%, 70%, 


and 80%, respectively, of a large group. Compare the difference in 
difficulty between A and B with the difference in difficulty between 
C and D. 


In a certain college, ten grades, A+, А, A—; B+, B, B—; C+, С, 
C— ; and D, are assigned. If ability in mathematics is distributed nor- 
mally, how many students in a group of 500 freshmen should receive 
each grade? 


Assume that the distribution of grades in a class of 500 freshmen is 
normal with M — 72 and SD — 10. 'The instructor wants to give letter 
grades as follows: 1095 A's; 30% B's; 40% C's; 15% D's; and 
5% Ұз. Compute to the closest score the divisions between A's and B's; 
B's and C's; C's and D's; D's and F's. 


ANSWERS 


1. 3/4 2. 3/8 3. 10/32 


(а) 1/2 
(5) 2/3 
(с) 1/3 


4/5 6. (а) 1/6 


10. 


1. 


13. 


14. 
15. 


16. 


22. 
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(a) 1/1024 
(b) 1/252 


32/243 


For expected distribution 
M = 25,6 = 112 


(a) 4383 (d) .7409 
(b) А4657 (e) .2171 
(c) .6705 (f) .0665 


(а) —.610, —.106, 106, 880 

(b) 11,31, 80 

(a) Skewness Kurtosis 
By formula (19) By formula (20) By formula (21) 
(1) —.018 - 27 239 
(2) 1156 103 211 

(с) 66%, 67% қ 

23, 44, 13 


(a) 570 

(b) 50; 3 

(с) 33 in 100 or 1 in 3 
(a) 67% 

(b) 83 and 117 

(c) 113 


. 2395; 1896 

. Three times as great. 

. 39% as compared with 42%. 

. Difference between A and B is 256; between C and D, .326. 


Grades: A+ А А- Bt В B- СР CC- 
Students 
Receiving: 3 14 40 80 113 13 80 40 14 


85; 75; 64; 56 


^ 
9 


LINEAR CORRELATION 


+ 


l. The Meaning of Correlation 


1. Correlation as a measure of relationship 


In previous chapters we have been concerned primarily with 
methods of computing statistical measures designed to represent in a 
reliable way the performance of an individual or a group in some 
defined trait. Frequently, however, it is of more importance to 
examine the relationship of one ability to another than to measure 
performance in either alone. Are certain abilities closely related, and 
others relatively independent? Is it true that good pitch discrimina- 
tion accompanies musical achievement; or that bright children tend 
to be less neurotic than average children? If we know the general 
intelligence of a child, as measured by a standard test, can we say 
anything about his probable scholastic achievement as represented 
by grades? Problems like these ‘and many others which involve the 
relations among abilities can be studied by the method of correla- 
tion. 

When the relationship between two sets of measures is "linear," 
ie, сап be described by a straight line,* the correlation between 
scores may be expressed by the “product-moment” coefficient of cor- 
relation, designated by the letter r. The method of caleulating r will 
be outlined in Section III. Before taking up the details of calculation, 
let us make clear what correlation means, and how r measures 
relationship. 

Consider, first, a situation in which relationship is fixed and 
unchanging. The circumference of a circle is always 3.1416 times its 


* See pages 154-158 for a further discussion of “linear” relationship. 
122 
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diameter (C — 3.1416D), and this equation holds no matter how 
large or how small the circle, or in what part of the world we find it. 
Each time the diameter of a circle is inereased or decreased, the cir- 
cumference is increased or decreased by just 3.1416 times the same 
amount. In short, the dependence of cireumference upon diameter is 
absolute; the correlation between the two dimensions is said to be 
perfect, and 7 = 1.00. In theory, at least, the relationship between 
two abilities, as represented by test scores, may also be perfect. Sup- 
pose that a hundred students have exactly the same standing in two 
tests—the student who scores first in the one test scores first in the 
other, the student who ranks second in the first test ranks second in 
the other, and this one-to-one correspondence holds throughout the 
entire list. The relationship is perfect, since the relative position of 
each subject is exactly the same in one test as in the other; and the 
coefficient of correlation is 1.00. 

Now let us consider the case in which there is just no correlation 
present. Suppose that we have administered to 100 college seniors 
the Army General Classification Test and a simple “tapping test” in 
which the number of separate taps made in thirty seconds is re- 
corded. Let the mean AGCT score for the group be 120, and the 
mean tapping rate be 185 taps in thirty seconds. Now suppose that 
when we divide our group into three sub-groups in accordance with 
the size of their AGCT scores, the mean tapping rate of the superior 
or “high” group (whose mean AGCT score is 130) is 184 taps in 
thirty seconds; the mean tapping rate of the “middle” group (whose 
mean AGCT score is 110) is 186 taps in thirty seconds; and the mean 
tapping rate of the “low” group (whose mean AGCT score is 100) is 
185 taps in thirty seconds. Since tapping rate is almost identical in 
all three groups, it is clear that from tapping rate alone we should 
be unable to draw any conclusion as to a student's probable perform- 
ance upon AGCT. А tapping rate of 185 is as likely to be found with 
an AGCT score of 100 as with one of 120 or even 160, In other words, 
there is no correspondence between the scores made by the members 
of our group upon the two tests, and r, the coefficient of correlation, 
is zero.* 

Perfect relationship, then, is expressed by a coefficient of 1.00, and 
just no relationship by a coefficient of .00. Between these two limits, 
inereasing degrees of relationship are indicated by such coefficients as 


* It may be noted that the number of groups (here 3) is unimportant: any 
convenient set may be used. The important point is that when the correlation 
is zero, there is no systematic relationship between two sets of scores. 
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:33, or .65, or .92. A coefficient of correlation falling between .00 and 
1.00 always implies some degree of positive association, the degree 
of correspondence depending upon the size of the coefficient. 

Relationship may also be negative; that is, a high degree of one 
trait may be associated with a low degree of another. When negative 
or inverse relationship is perfect, r — —1.00. То ilustrate, suppose 
that in a small class of ten schoolboys, the boy who stands first in 
Latin ranks lowest (tenth) in shop work; the boy who stands second 
in Latin ranks next to the bottom (ninth) in shop work; and that 
each boy stands just as far from the top of the list in Latin as from 
the bottom of the list in shop work. Here the correspondence between 
achievement in Latin and performance in shop work is one-to-one 
and definite enough, but the direction of relationship is inverse and 
т = —1.00. Negative coefficients may range from —1.00 up to .00, 
just as positive coefficients may range from .00 up to 1.00. Coefficients 
of —.20, —.50, or —.80 indicate increasing degrees of negative or 
inverse relationship, just as positive coefficients of 120, .50, and .80 
indicate increasing degrees of positive relationship. 


2. Correlation expressed as agreement between ranks 


Тһе notion underlying correlation can often be most readily com- 
prehended from a simple graphic treatment. "Three examples will be 
given to illustrate values of r of 1.00, —1.00, and approximately .00. 
Correlation is rarely computed when the number of eases is less than 
25, so that the examples here presented must be considered to have 
illustrative value only. 

Suppose that four tests, A, B,C, and D, have been administered to 
а group of five children. The children have been arranged in order 
of merit on Test A and their scores are then compared separately 
with Tests B, C, and D to give the following three cases: 


Case 1 Case 2 Case 3 
Pupil А в Papil cA ИЕС Pupl А D 
a 15 53 а 15 64 а 15 102 
14 52 b 14 65 b 14 100 
c 13 51 с 18 66 с 13 104 
а 12 50 а 12 67 а 12 103 
e 1 % e 1 68 e 1 101 


Now if the second series of scores under each case (ie., B, C, and D) 
is arranged in order of merit from the highest score down, and the two 
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scores earned by each child are connected by a straight line, we have 
the following graphs: 


Case 1 Case 2 Case 3 
A B A с А р 
15 58 15 68 15 104 
14 52 14 67 14 103 
18 51 13 6 13 102 
12 50 12 65 12 101 
11 49 11 64 1. 100 
All connecting linesare АП connecting lines Хо system is exhibited 


horizontal and parallel, 


intersect in one point. 


bythe connecting lines, 


and the correlation is | Thecorrelation is nega- but the resemblance is 
positive and perfect. tive and perfect, and closer to Case 2 than 
те 14 те – 14 to Case 1. Correla- 


tion low and negative 


Тһе more nearly the lines connecting the paired scores are horizon- 
tal and parallel, the higher the positive correlation. The more nearly 
the connecting lines tend to intersect in one point, the larger the 
negative correlation. When the connecting lines show no systematic 
trend, the correlation approaches zero. 


3. Summary 


То summarize our discussion up to this point, coefficients of cor- 
relation range over a scale which extends from —1.00 through .00 
to 1.00. A positive correlation indicates that large amounts of the 
one variable tend to accompany large amounts of the other; a nega- 
tive correlation indicates that small amounts of the one variable tend 
to accompany large amounts of thesother. A zero correlation indi- 
cates no consistent relationship. We have illustrated above only per- 
fect positive, perfect negative, and approximately zero correlation in 
order to bring out the meaning of correlation in a striking way. Only 
rarely, if ever, however, will a coefficient fall at either extreme of the 
scale, i.e., at 1.00 or —1.00. In most actual problems, calculated r’s 
fall at intermediate points, such as .72, —.26, .50, etc. Such 775 are to 
be interpreted as “high” or “low” depending in general upon how 
close they are to +1.00. Interpretation of the degree of relationship 
expressed by т in terms of various criteria will be discussed later on 
pages 173-178. 
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11. The Coefficient of Correlation * 


І. The coefficient of correlation as a ratio 


Тһе product-moment coefficient of correlation may be thought of 
essentially as that ratio which expresses the extent to which changes 
іп one variable are accompanied by—or are dependent upon— 
changes in a second variable, As an illustration, consider the follow- 
ing simple example which gives the paired heights and weights of five 
college seniors: 


(1) o з @ © (9 (7) (8) (9) 


Ht. Wt. 
Student іп in 
inches lbs. 
z y z y 
AOI ME NE A E 
а 72-10 9 0 0 134 00 00 
b 69 16 0 -5 0 100 — 37 00 
с 66 10-8 —20 60 —134 —146 1:96 
d ДОЛ ТӘН! 10 710 44 73 32 
e 68:1 189 — 1 7* 354 —18. 7— 44 1 110 1 5 4g 
55 7180 
Mx =69in. oz = 2.24 in. z(z.2) Ex 
Мұ = 1101Ьз. c, = 13.69 lbs. — correlation = E Sol “= 36 


From the X and Y columns it is evident that tall students tend to be 
somewhat heavier than short students, and hence the correlation 
between height and weight is almost certainly positive The mean 
height is 69 inches, the mean weight 170 pounds, and the o’s are 
2.24 inches and 13.69 pounds, respectively. In column (4) are given 
the deviations (z's) of each man’s height from the mean height, and 
in column (5) the deviations (y’s) of each man’s weight from the 
mean weight. The product of these paired deviations (xy’s) is a 
measure of the agreement between individual heights and weights, 
and the larger the sum of the zy column the higher the degree of 
correspondence. When agreement is perfect (and r = 1.00) the Sry 
column has its maximum value. One may wonder why the sum of 
хув 
N 
between т and y. The answer is that such an average is not a stable 
measure of relationship, as it depends upon the units in which height 
* This section may be taken up after Section III. 


(is. % would not yield a suitable measure of relationship 
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and weight have been expressed, and consequently will vary if centi- 
meters and kilograms, say (as shown in the example below), are 
employed instead of inches and pounds. One way to avoid the 
troublesome matter of differences in units is to divide each x and 
each y by its own o, i.e., express each deviation as a o-score. The 
sum of the products of the o-scores—column (9)—divided by N 
yields a ratio which, as we shall see later, is a stable expression of 
relationship. This ratio is the “product-moment” * coefficient of 
correlation. Its value of .36 indicates a fairly high positive correla- 
tion between height and weight in this small sample. The student 
should note that our ratio or coefficient is simply the average prod- 
uct of the o-scores of corresponding X and Y measures. 

Let us now investigate the effect upon our ratio of changing the 
units in terms of which X and Y have been expressed. In the example 
below, the heights and weights of the same five students are expressed 
(to the nearest whole number) in centimeters and kilograms instead 
of in inches and pounds: 


a) D G @ 6) (9 (7) (8) (9) 


Ht. Wt. 
Student іп in 
ems. kgs. 
т y Ces 
m x i y £V Oz су (2 2) 
a 18 77 8 0 0 1.43 .00 .00 
b 175 75 0 -2 0 00 — .2 -00 
c 168 68 -7 -9 68 -125 — 143 1.79 
а 178 82 3 5 15 .58 42 
е 13 & —2 7 —14 — 56 111 - 40 
64 1.81 


Mx = 175 cms. c, = 5.61 cms. 2 
Му =77kgs. бу = 6.30 kgs. corfelation = —— =- 


м 
— 
als 
gle 
m 
фо 
2 
в 


Тһе mean height of our group is now 175 ems. and the mean weight 
77 kgs.; the o’s are 5.61 ems. and 6.30 kgs., respectively. Note that 
the sum of the ту column, namely, 64, differs by 9 from the sum of 
the zy's in the example above, in which inches and pounds were the 
units of measurement. However, when deviations are expressed as 
o-scores, the sum of their products (2 . A divided by N equals .36 

Or Oy 
ав before. 


Тһе sum of the deviations from the mean (raised to some power) and 
divided by N is called a “moment.” When corresponding deviations in т and 


2%; Қару 
у are multiplied together, summed, and divided Бу N | to give zu) the 
term “product-moment” is used. 
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x(2-3) 
Or бу 
N 


is a measure of relationship which remains constant for a given веб 
of data, no matter in what units X and Y are expressed. When this 


Тһе quotient 


ratio is written it becomes the well-known expression for D 


6,6, 
the product-moment coefficient of correlation.* 


2. The scatter diagram and the correlation table 


When N is small, the ratio method described in the preceding sec- 
tion may be employed for computing the coefficient of correlation 
between two sets of data. But when N is large, much time and labor 
may be saved by first arranging the data in the form of a diagram or 
chart, and then calculating deviations from assumed, instead of from 
actual, means. Let us consider the diagram in Figure 35. This chart, 
which is called a “scatter diagram” or "scattergram," represents the 
paired heights and weights of 120 college students. The construction 
of a scattergram is relatively simple. Along the left-hand margin 
from bottom to top are laid off the class-intervals of the height dis- 
tribution, measurement expressed in inches; and along the top of the 
diagram from left to right are laid off the class-intervals of the 
weight distribution, measurement expressed in pounds. Each of the 
120 men is represented on the diagram with respect to height and 
weight. Suppose that a man weighs 150 pounds and is 69 inches tall. 
His weight locates him in the sixth column from the left, and his 
height in the third row from the top. Accordingly, a “tally” is placed 
in the third cell of the sixth column. There are three tallies in all in 
this cell, that is, there are three men who weight from 150 to 159 
pounds, and are 68-69 inches tall. Each of the 120 men is represented 
by a tally in a cell or square of the table in accordance with the two 
characteristics, height and weight. Along the bottom of the diagram 
in the f, row is tabulated the number of men who fall in each weight- 
interval; while along the right-hand margin in the f, column is tabu- 

* The coefficient of correlation, 7, is often called the “Pearson r” after Pro- 
fessor Karl Pearson who developed the product-moment method, following the 
earlier work of Galton and Bravais. See Walker, Н. M., Studies in the Н. istory 


of Statistical Method (Baltimore: Williams and Wilkins Co., 1929), Chap- 
ter 5, pp. 96-111. 
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Weight in Pounds (X-Variable) 
100- 110- 120- 130- 140- 150- 160- 170- 
109 119 129 139 М9 159 169 179 / M 


12-13 1 1745 
2 70-71 16 1520 
z 68-69 28 1424 
B 
3 66-67 38 1351 
жш 
E 
‚В 64-65 26 1280 
3 
® 
© 62. Я 
9 62-68 13 125.3 

60-61 8 1118 


ate Si ln 107 248,751: 95-925 27610220 
Mm 625 641 654 666 670 689 689 702 


Summary 
; Mean ht. for given А Mean wt. for given 

Weight wt. interval Height ht. interval 
170-179) . 702] . 72-78] . 1745| 2 
160-169 | 3 68.9 | 4 70-71 | ,8 1520| = 
150-159 | B 689| ү. 68-69 | сз 1424 | m 
140-149 | Б. 67.0 | ~ 66-67 > > 135.1 + = 
130-139 | g 666| g 64-65 | ч 1280| 2 
120-129 654| 2 62-63 E 1253| & 
110-119 3 641| 2 60-61 1178) 8 
100-109 625 ix 


FIG. 35 А scattergram and correlatian table showing the paired heights 
and weights of 120 students 


lated the number of men who fall in each height-interval. The f, 
column and f, row must each total 120, the number of men in all. 
After all of the tallies have been listed, the frequency in each cell is 
added and entered on the diagram. The scattergram is then a cor- 
relation table. 

Several interesting facts may be gleaned from the correlation table 
as it stands. For example, all of the men of a given weight-interval 
may be studied with respect to the distribution of their heights. In 
the third column there are twenty-eight men all of whom weigh 120- 
129 pounds. One of the twenty-eight is 70-71 inches tall; four are 
68-69 inches tall; nine or 66-67 inches tall; seven are 64-65 inches 
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tall; and seven are 62-63 inches tall. In the same way, we may 
classify all of the men of a given height-interval with respeet to 
weight distribution. Thus, in the row next to the bottom, there are 
thirteen men all of whom are 62-63 inches tall. Of this group one 
weighs 100-109 pounds; two weigh 110-119 pounds; seven weigh 
120-129 pounds; one weighs 130-139 pounds; and two weigh 140-149 
pounds. It is fairly clear that the “drift” of paired heights and 
weights is from the upper right-hand section of the diagram to the 
lower left-hand section. Even a superficial examination of the dia- 
gram reveals a fairly marked tendency for heavy, medium, and light 
men to be tall, medium, and short, respectively; and this general 
relationship holds in spite of the scatter of heights and weights 
within any given "array" (an array is the distribution of cases within 
a given column or row). Even before making any calculations, then, 
we should probably be willing to guess that the correlation between 
height and weight is positive and fairly high. 

Let us now go a step further and calculate the mean height of the 
three men who weigh 100-109 pounds, the men in column one. The 
mean height of this group (using the assumed mean method described 
in Chapter 2, p. 36) is 62.5 inches, and this figure has been written 
in at the bottom of the correlation table. In the same way, the mean 
heights of the men who fall in each of the succeeding weight-inter- 
vals have been written in at the bottom of the diagram. These data 
have been tabulated in a somewhat more convenient form below the 
diagram. From this summary, it appears that an actual weight in- 
crease of approximately 70 pounds (104.5-174.5) corresponds to an 
inerease in mean height of 7.7 inches; that is, the increase from the 
lightest to the heaviest man is paralleled by an increase of approxi- 
mately eight inches in height. It seems clear, therefore, that the cor- 
relation between height and weight is positive. 

Let us now shift from height to weight, and applying the method 
used above, find the change in mean weight which corresponds to the 
given change in height.* The mean weight of the three men in the 
bottom row of the diagram is 117.8 pounds. The mean weight of the 
thirteen men in the next row from the bottom (who are 62-63 inches 
tall) is 125.3 pounds. The mean weights of the men who fall in the 
other rows have been written in their appropriate places in the M, 
column. In the summary of results we find that in this group of 120 
men an increase of about 12 inches in height is accompanied by an 


* This change corresponds to the second regression line in the correlation dia- 
gram (see p. 153). 


Tueee o 
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increase of about 56.7 pounds in mean weight. Thus it appears that 
the taller the man the heavier he tends to be, and again the correla- 
tion between height and weight is seen to be positive. 


3. The graphic representation of the correlation coefficient 


It is often helpful in understanding how the correlation coefficient 
measures relationship to see how a correlation of .00 or .50, say, looks 
graphieally. Figure 36 (1) pictures a correlation of .50. Тһе data 


а) (2) 
X-Test X-Test 


Row Ro 
0-9 10-19 20-29 30-99 40-49 fy Means 0-9 10-19 20-29 30-89 40-49 fy Means 


/ 
259 PE 16195 159 16 145 
0-9 ШІН ЕТЕУ 4 45 
fz4 иа 16 4 64 fa 4 15 52 в 4 64 
Col. Means 145 195 245 29.5 345 Col. Means 45 145 245 345 445 
Т-.50 7=1.00 
(8) (4) 
X-Test X-Test 
0-9 10-19 20-29 30-39 40-49 fy Нот, 0-9 10-19 20-29 30-8940-49 fy ROW. 


N 
AN 
" 


4 895 
te 4 16 2 ваа ie 4 16 эж в 4 64 
Col. Means 24.5 24,5 245 245 245 Col. Means 39.5 320 245 17.0 9.5 
r=.00 r=- 75 


FIG. 36 Тһе graphical representation of the correlation coefficient 


in the table are artificial, and were selected to bring out the relation- 
ship in as unequivocal a fashion as possible. The scores laid off along 
the top of the correlation table from left to right will be referred to 
simply as the X-test “scores,” and the scores laid off at the left of 
the table from bottom to top as the Y-test “scores.” As was done in 
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Figure 35, the mean of each Y-row is entered on the chart, and the 
means of the' X-columns are entered at the bottom of the dia- 
gram. 

Тһе means of each Y-array, that is, the means of the "scores" fall- 
ing in each X-column, are indicated on the chart by small crosses. 
"Through these crosses a line, called a regression line,* has been 
drawn. This line represents the change in the mean value of Y over 
the given range of X. In similar fashion, the means of each X-array, 
i.e., the means of the scores in each Y-row, are designated on the 
chart by small circles, through which another line has been drawn. 
This second regression line shows the change in the mean value of X 
over the given range of Y. These two lines together represent the 
linear or straight-line relationship between the variables X and Y. 

The closeness of association or degree of correspondence between 
the X- and Y-tests is indicated by the relative positions of these 
two regression lines. When the correlation is positive and perfect, 
the two regression lines close up like a pair of scissors to form one 
line. Chart (2) in Figure 36 shows how the two regression lines look 
when т = 1.00, and the correlation is perfect. Note that the entries 
in Chart (2) are concentrated along the diagonal from the upper 
right- to the lower left-hand section of the diagram. There is no 
“scatter” of scores in the successive columns or rows, all of the scores 
in a given array being concentrated within one cell. If Chart (2) 
represented a correlation table of height and weight, we should know 
that the tallest man was the heaviest, the next tallest man the next 
heaviest, and that throughout the group the correspondence of height 
and weight was perfect. 

A very different picture from"that of perfect correlation is pre- 
sented in Chart (3) where the correlation is .00. Here the two regres- 
sion lines, through the means of the columns and rows, have spread 
out until they are perpendicular to each other. There is no change in 
the mean Y-seore over the whole range of X, and no change in the 
mean X-score over the whole range of Y. This is analogous to the 
situation described on page 128, in which the mean tapping rate of a 


group of students was the same for those with “high,” “middle,” and ` 


“low” AGCT scores. When the correlation is zero, there is no way 
of telling from a subject’s performance in one test what his perform- 
ance will be in the other test. The best one сап do is to select the 
mean as the most probable value of the unknown score. 


* Regression lines have important properties ; they will be defined and dis- 
cussed more fully in Chapter 7. 
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Chart (4) in Figure 36 represents a correlation coefficient of —.75. 
Negative relationship is shown by the fact that the regression lines, 
through the means of the columns and rows, run from the upper left- 
to the lower right-hand section of the diagram. The regression lines 
are closer together than in Chart (1) where the correlation is .50, but 
are still separated. If this chart represented a correlation table of 
height and weight, we should know that the tendency was strong for 
tall men to be light, and for short men to be heavy. 


Weight in Pounds (X) 
100- 110- 120-  130- 140- 150- 160-  170- 


19 119 9 139 149 159 169 179 Row 

TM ik fy Means 

eer 1 1745 

10-71 

16 152.0 

S 28 124 
t 

T ana зв 135.1 
Я 
2 

mem 26 1280 
д 

fo 13 1258 

en 3 117.8 


Саты УТ, Ча. Б 6 19 
Col. Means 625 641 64 666 67.0 689 689 702 
FIG. 37 Graphical representation ofthe correlation between height and 
weight in a group of 120 college students (Fig. 35) 


The charts in Figure 36 represent, as was stated above, a linear 
relationship between sets of artificial test scores. The data were 
selected so as to be symmetrical around the means of each column 
and row, and hence the regression lines go through all of the crosses 
„and through all of the circles in the successive columns and rows. It 
is rarely if ever true, however, that the regression lines pass through 
all of the means of the columns and rows in a correlation table which 
Tepresents actual test scores or other real measures. Figure 37, which 
reproduces the correlation table of heights and weights given on 
page 129, illustrates this fact. The mean heights of the men in the 
weight (X) columns are indicated by crosses, and the mean weights 
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of the men in the height (Y) rows by circles, as in Figure 36. Note 
that the series of short lines joining the successive crosses or circles 
presents a decidedly jagged appearance. Two straight lines have been 
drawn in to describe the general trend of these irregular lines. These 
two lines go through, or as close as possible to, the crosses or the 
circles, more consideration being given to those points near the middle 
of the chart (because they are based upon more data) than to those 
at the extremes (which are based upon few scores). Regression lines 
are called lines of “best fit” because they satisfy certain mathemati- 
cal criteria to be given later (p. 154). Such lines describe better than 
any other straight lines the “run” or “drift” of the crosses and circles 
across the chart. 

In Chapter 7 we shall develop equations for the “best fitting” lines 
and show how they may be drawn in to describe the trend of irregu- 
lar points on a correlation table. For the present, the important fact 
to get clearly in mind is that when correlation is linear, the means 
of the columns and rows in a correlation table can be adequately 
described by two straight lines and the closer together these two lines, 
the higher the correlation. 


IIl. The Calculation of the Coefficient of Correlation by the 
Product-Moment Method 


I. The calculation of r from a correlation table 


Having discussed the meaning of correlation in the last sections, 
we shall now proceed to the calculation of the coefficient of correla- 
tion by the product-moment method. Figure 38 will serve as an illus- 
tration of the computations required. This correlation table gives the 
paired heights and weights of 120 college students, and is derived 
from the scattergram for the same data shown in Figure 35. The fol- 
lowing outline of the steps in the process of calculating r will be best 
understood if the student will constantly refer to Figure 38 as he 
reads through each step. 


Step I 


Construct a scattergram for the two variables to be correlated, 
and from it draw up a correlation table as described on page 128. 
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Step 2 


Тһе distribution of heights for the 120 men is in the f, column at 
the right of the diagram. Assume a mean for the height distribution, 
using the rules given in Chapter 2, page 37, and draw double lines 
to mark off the row in which the assumed mean (At) falls. Тһе 
mean for the height distribution has been taken at 66.5 in. (midpoint 
of interval 66-67) and the 1/8 have been taken from this point. The 
prime (^) of the z^s and y”s indicates that these deviations are taken 
from the assumed means of the X and Y distributions (see p. 37). 
Now fill in the fy’ and fy’? columns. From the first column су, the 
correction in units of interval, is obtained; and this correction to- 
gether with the sum of the fy’? will give the с of the height distribu- 
tion, бу. As shown by the calculations in Figure 38, the value of oy 
is 2.62 inches. 

The distribution of the weights of the 120 men is in the f, row at 
the bottom of the diagram. Assume a mean for the weight distribu- 
tion, and draw double lines to designate the column under the as- 
sumed mean (wt). The mean for the weight distribution is taken at 
134.5 pounds (midpoint of interval 130-139), and the 2” are taken 
from this point. ЕШ in the fz’ and the fz’? rows; from the first cal- 
culate Cz, the correction in units of interval, and from the second 
calculate dz, ће с of the entire weight distribution. In Figure 38, 
the value of о, is found to be 15.54 pounds. 


Step 3 


The calculations in Step 2 simply repeat the now familiar process 
of calculating о by the Assumed Mean method. Our first new task is 
to fill in the Ez'y' column at the right of the chart. Since the entries 
in this column may be either + or —, two columns are provided 
under 2’y’. Calculation of the entries in the Sz’y’ column may be 
illustrated by considering, first, the single entry in the only occupied 
cell in the topmost row. The deviation of this cell from the AM of 
the weight distribution, that is, its 2’, is four intervals, and its devia- 
tion from the AM of the height distribution, that is, its y’, is three 
intervals. Hence, the product of the deviations of this cell from the 
two AM's is 4X3 or 12; and a small figure (12) is placed in the 
upper right-hand corner of the сей,“ The “product-deviation” of the 

* We may consider the codrdinates of this cell to be x’ = 4, у = 3. The 2’ is 
obtained by counting over four intervals from the vertical column containing 


the AM (wt), and the y’ by counting up three intervals from the horizontal row 
containing the AM (ht). The unit of measurement is the class-interval. 
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one entry in this cell is 1(4 X 3) or 12 also, and hence a figure 12 is 
placed in the lower left-hand corner of the cell. This figure shows the 
product of the deviations of this single entry from the AM’s of the 
two distributions. Since there are no other entries in the cells of this 
row, 12 is placed at once under the ++ sign in the Xz^y' column. 

Consider now the next row from the top, taking the cells in order 
from right to left. The cell immediately below the one for which we 
have just found the product-deviation also deviates four intervals 
from the AM (wt) (its 2’ is 4), but its deviation from the AM (ht) 
is only two intervals (its y’ is 2). The product-deviation of this cell, 
therefore, is 4 X 2 or 8, as shown by the small figure (8) in the upper 
right-hand corner of the cell. There are three entries in this cell, and 
since each has a product-deviation of 8, the final entry in the lower 
left-hand corner of the cell is 3(4 X 2) or 24. Тһе product-deviation 
of the second cell in this row is 6 (its 27 is 3 and its y’ is 2) and since 
there are two entries in the cell, the final entry is 2(3 X 2) or 12. 
Each of the four entries in the third cell over has a product-deviation 
of 4 (since 2 = 2 and y’ = 2) and the final entry is 16. In the fourth 
cell, each of the three entries has a product-deviation of 2 (z' = 1 and 
У = 2) and the cell entry is 6. Тһе entry in the fifth cell over, the 
cell in the AM (wt) column, is 0, since 2” is 0, and accordingly 
8(2 Х0) must be 0. Note carefully the entry (—2) in the last cell 
of the row. Since the deviations of this cell are x’ = —1, and y' = 2, 
the product 1(—1 X 2) = —2, and the final entry is negative. Now 
we may total up the plus and minus entries in this row and enter the 
results, 58 and —2, in the Улу” column under the appropriate signs. 

The final entries in the cells for the other rows of the table and the 
sums of the product-deviations of tach row are obtained as illus- 
trated for the two rows above. The reader should bear in mind in 
calculating z^y"s that the product-deviations of all entries in the 
cells in the first and third quadrants of the table are positive, while 
the product-deviations of all entries in the second and fourth quad- 
rants are negative (p. 10). It should be remembered, too, that all 
entries either in the column headed by the AM or the row headed 
by the AMy have zero product-deviations, since in the one case the 2’ 
and in the other the y^ equals zero. 

Since all entries in a given row have the same y’, the arithmetic 
of calculating 2’y”s may often be considerably reduced if each entry 
in a row-cell is first multiplied by its 27, and the sum of these devia- 
tions (€x) multiplied once for all by the common у”, viz., the y’ of 
the row. The last two columns Sx’ and Ery’ contain the entries for 
the rows. To illustrate the method of calculation, in the second 
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row from the bottom, taking the cells in order from right to 
left, and multiplying the entry in each cell by its z', we have 
(2X1) + (10) + (7X —1) + (2 X —2) + (1 X —3) or —12. If 
we multiply this “deviation-sum” by the y’ of the whole row (ie., 
by —2) the result is 24 which is the final entry in the Ez^y' column. 
Note that this entry checks the 28 and —4 entered separately in the 
Xa^y' column by the longer method. This shorter method is often 
employed in printed correlation charts and is recommended for use 
as soon as the student understands fully how the cell entries are 
obtained. 


Step 4 (Checks) 


The Ху! may be checked by computing the product-deviations 
and summing for columns instead of rows. The two rows at the bot- 
tom of the diagram, Sy’ and Улуу, show how this is done. We may 
illustrate with the first column on the left, taking the cells from top 
to bottom. Multiplying the entry in each cell by its appropriate y. 
we have (1 X —1) + (1 X —2) + (1 X —3) or —6. When this entry 
in the Sy’ row is multiplied by the common z' of the column (i.e., 
by —3) the final entry in ће Ez'y' row is 18. The sum of the zy 
computed from the rows should check the sum of the 27у” computed 
from the columns. 

Two other useful checks are shown in Figure 38. The fy’ will equal 
the Sy’ and the fz’ will equal the Ez' if no error has been made. The 
fy’ and the fz’ are the same as the Ху and Ут; although these col- 
umns and rows are designated differently, they denote in each case 
the sum of deviations around their AM. 


Step 5 


When all of the entries in the Ez^y' column have been made, and 
the column totaled, the coefficient of correlation may be calculated 
by the formula 

Улу’ 
ү 74% 
Ше ағы ag (22) 


(coefficient of correlation when deviations are taken from 
the assumed means of the two distributions) * 


* This formula for т differs slightly from the ratio formula developed on 
page 128). The fact that deviations are taken from assumed rather than from 
actual means makes it necessary to correct Ez'y' by subtracting the product of 
the two corrections с, and су. 
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Substituting 146 for z'y'; .02 for су; .18 for c,; 1.31 for oy; 1.55 
for o+; and 120 for N, т is found to be .60. (See Fig. 38.) 

Tt is very important to remember that Cs, cy, Os, and c, are all left 
in units of class-interval in formula (22). This is done because all 
product-deviations (2/71/78) are in interval-units, and it is desirable 
therefore to keep all of the terms in the formula in interval-units. 
Leaving the corrections and the two o's in units of class-interval 
facilitates computation, and does not change the result (ie., the 
value of the coefficient of correlation). 


2. The calculation of r from ungrouped data 


(1) THE FORMULA FOR 7 WHEN DEVIATIONS ARE TAKEN FROM THE 
MEANS OF THE TWO DISTRIBUTIONS X AND Y 

In formula (22) z' and у” deviations are taken from assumed 

Угу! 


means; and hence it is necessary to correct by the product of 


the two corrections, c, and c, (p. 138). When deviations have been 
taken from the actual means of the two distributions, instead of from 
assumed means, no correction is needed, as both c, and c, are zero. 
Under these conditions, formula (22) becomes 


Уту 


No,0, (23) 


r= 


(coeficient of correlation when deviations are taken from 
the means of the two distributions) 


which is the ratio for measuring correlation developed on page 128. 


If we write = for c, and = for oy, ће N’s cancel and formula 


(23) becomes 


к= ЕН... (94) 


VXSXZ 
(coefficient of correlation when deviations are taken from 
the means of the two distributions) 


in which x and y are deviations from the actual means as in (23) and 
Ха? and Xy? are the sums of the squared deviations in т and у taken 
Írom the two means. 

When N is fairly large, so that the data can be grouped into a cor- 
relation table, formula (22) is always used in preference to formulas 
(23) or (24) as it entails much less calculation. Formulas (23) and 
(24) may be used to good advantage, however, in finding the correla- 
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tion between short, ungrouped series (say, twenty-five cases or ко). 
It is not necessary to tabulate the scores into a frequency distribu- 
tion. An illustration of the use of formula (24) is given in Table 18, 
below. The problem is to find the correlation between the scores 
made by twelve adults on two tests of “controlled association.” 

The steps in computing r may be outlined as follows: 


Step | 


Find the mean of Test 1 (X) and the mean of Test 2 (Y). The 
means in Table 18 are 62.5 and 30.4, respectively. 


Step 2 


Find the deviation of each score on Test 1 from its mean, 62.5, and 
enter it in column z. Next find the deviation of each score in Test 2 
from its mean, 30.4, and enter it in column y. 


Step 3 


Square all of the z's and all of the у? and enter these squares in 
columns 22 and y?, respectively. Total these columns to obtain Ex? 
and Xy?. 


TABLE 18 To illustrate the calculation of r from ungrouped scores 
when deviations are taken from the means of the series 


Test1  Test2 
Subject x Y жы y zy 

A 50 - 12.5 70.56 105.00 
B 54 25 - 8.5 29.16 45.90 
с 56 34 — 6.5 12.00 — 23.40 
D 59 28 - 8.5 5.76 8.40 
Е 60 26 - 2.5 19.36 11.00 
Е 62 30 —.5 16 .20 
G 61 32 = 15 2.56 - 2.40 
н 65 30 2.5 16 — 1.00 
I 67 28 4.5 5.76 - 10.80 
J 71 34 8.5 12.96 30.60 
K 71 36 8.5 81.86 47.60 
L a 40 11.5 9216 11040 

гу, 

Мх = 62.5 My = 30.4 
Izy 321.50 =.78 (94) 


r= = 
Vie xXx zy V595 х 282.92 
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Step 4 


Multiply the z’s and y’s in the same rows, and enter these products 
(with due regard for sign) in the zy column. Total the zy column, 
taking account of sign, to get Уту. 


Step 5 


Substitute for Exy, 321.50; for Xa?, 595; and for Хуу, 282.92 in 
formula (24), as shown in Table 18, and solve for r. 

While formula (24) is useful in calculating r directly from two 
ungrouped series of scores, it has the same disadvantage as the “long 
method" of caleulating means and o's described in Chapters 2 and 3. 
Тһе deviations 2 and y when taken from the actual means are usually 
decimals and the multiplication and squaring of these values is often 
a tedious task. For this reason—even when working with short un- 
grouped series—it is often easier to assume means, calculate devia- 
tions from these AM’s, and apply formula (22). The procedure is 
illustrated in Table 19 with the same data given in Table 18. Note 


TABLE 19 To illustrate the calculation of r from ungrouped scores when 
deviations are taken from the assumed means of the series 


Test 1 Тея2 


Subject х Y 2 v 2" у" ту 
А 50 22 -10 -8 0 64 80 
В 54 25 -6 -5 36 25 30 
С 56 34 -4, 4 16 16—16 
р 59 28 -1' -2 1 4 2 
Е 60 26 0 -4 0 16 0 
Е 62 30 2 0 4 0 0 
с 61 32 1 2 1 4 2 
H 65 30 5 0 25 0 0 
І 67 28 7 -2 49 4 -14 
J 71 34 11 4 121 16 44 
K 71 36 11 6 10 36 66 
L 74 40 1А... 10) 196° 0100 7140 

750 865 610 285 334 
(аз) (Ху?) (шу? 
АМх - 60.0 АМү = 300 
Mx = 62.5 My = 30.4 
€, = 2.5 y= 4 334 109 
а, = 6.25 cy = .16 12 


"= 704 x 486 (22) 
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that the two means, Mx and Мұ, are first caleulated. Тһе correc- 
tions, c, and су, are found by subtracting AM x from Mx and AMy 
from My (p. 38). Since deviations are taken from assumed means, 
fractions are avoided; and the calculations of Ez^?, Ly’, Ez'y' are 
readily made. Substitution in formula (22) then gives r. 


(2) THE CALCULATION OF 7 FROM RAW SCORES, I.E., WHEN DEVIATIONS 
ARE TAKEN FROM ZERO 
The calculation of r may often be carried out most readily—espe- 
cially when a calculating machine is available—by means of the fol- 
lowing formula which is based upon “raw” or obtained scores: 
A. IXY —NMxMy 
VEX- NM*,][ZY? — NM*,] 


(coefficient of correlation calculated from raw or obtained scores) 


Т (25) 


In this formula, X and Y are obtained scores, and Mx and My are 
the means of the X and Y series, respectively. EX? and EY? are the 
sums of the squared X and Y values, and N is the number of cases. 

Formula (25) is derived directly from formula (22) by assuming 
the means of the X and Y tests to be zero. If AM x and AM y are zero, 
each X and Y score is a deviation from its AM as it stands, and hence 
we work with the scores themselves. Since the correction, c, always 
equals M — AM, it follows that when the AM equals 0, c, — Mx, 
Cy = My and c,c,— MxMy. Furthermore, when с, = Мх and 
c, = My and the "scores" are “deviations,” the formula 


72 
Or = 4 cu — е2, X interval 


(see p. 54) becomes 


2 


and o, for the same reason equals 4 PE — M?y. If we substitute 
these equivalents for c,c,, Oz, and o, in formula (22), the formula for 
r in terms of raw scores given in (25) is obtained. 
Ап alternate form of (25) is often more useful in practice. This is 
ui МУХҮ -XX X TY 
~ VINEX:- (ХХ) [УХУ - (ХҮ) 


(coefficient of correlation calculated from raw or obtained scores) 


(26) 
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'This formula is obtained from (25) by substituting ES for Мұ, and 


2u for My in numerator and denominator, and canceling the N’s. 


The calculation of r from original scores is shown in Table 20. 
The data are again the two sets of twelve scores obtained on the 
“controlled association” tests, the correlation for which was found 
to be .78 in Table 18. This short example is for the purpose of illus- 
trating the arithmetic and must not be taken as a recommendation 
that formula (25) be used only with short series. As a matter of fact, 
formula (25) or (26) is most useful, perhaps, with long series, espe- 
cially if one is working with a calculating machine. 


TABLE 20 To illustrate the calculation of r from ungrouped data when 
deviations are original scores (AM's = 0) 


Test 1 Test 2 
Subject X y x үз XY 
A 50 22 2500 484 1100 
B 54 25 2916 625 1350 
с 56 34 3136 1156 1904 
D 59 28 3481 784 1652 
E 60 26 3600 676 1560 
F 62 30 900 1860 
G 61 32 3721 1024 1952 
H 65 30 4225 900 1950 
I 67 28 4489 784 1876 
J 71 34 5041 1156 2414 
к 71 36 5041 1296 2556 
L 74 40 5476 1600 2960 
750 365 47470 11385 23134 
Ит S ш (means to two decimals) “ 
T. 23134 — 12 х 62.50 х 30.42 (25) 
[474170 — 12 X (62.50)*] [11385 — 12 X (30.42)1] 
r=.78 


The computation by formula (26) is straightforward and the 
method easy to follow, but the calculations become tedious if the 
scores are expressed in more than two digits. When using formula 
(26), therefore, it will often greatly lessen the arithmetical work, if 
we first “reduce” the original scores by subtracting a constant quan- 
tity from each of the original Х and У scores. In Table 21, the same 
two series of twelve scores have been reduced by subtracting 65 from 
each of the X scores, and 25 from each of the Y scores. The reduced 
scores, entered in the table under X’ and Y’, are first squared to give 
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TABLE 21 To illustrate the calculation of r from ungrouped data when 
deviations are original scores (AM's — 0) 


Scores are "reduced" by the subtraction of 65 from each X, and 25 
from each Y to give X’ and У’. 


Test Test 
sro 
Do do EIE с ү, xa y^ 520 
А 50 922 —i15' -% 295 9 45 
Bie Бат 125 1517 o 121 0 0 
C 56 М -9 9 8l 81 -81 
D 59 28 -6 3 36 9 = 18 
E 60 20 -5 1 25 1 2:5 
F 62 30 28 5 9 25 Tr 
G 61 3 —4 7 16 49 = 28 
H 65 3 0 5 0 25 0 
I 6 39 2 3 4 9 6 
07 тае 9 38 — 8l 54 
K 71 3 6 п 36 121 66 
i До, "пы 2 NE 9s 
750 365 — 30(®Х') 7 65(®Ү') 670(®Х)635(®ҮЗ) —159(2X'Y) 
) Sy 
Mx =X" +65 My «2T +9 
30 65 
= - 5+8 = тз +25 
= 62,5 = 80. 
(12 х 159) — (—30 х 65) od 
e= VOZ x 670 — (— 9/12 x 685 — (6579) 
= 4923 
= л8 


=X” and ZY”, and then multiplied by rows to give EX'Y'. Substi- 
tution of these values in formula (26) gives the coefficient of correla- 
tion r. If the means of the two series are wanted, these may readily 
zY' 
N 
Y scores were reduced (see computations in Table 21). 

Тһе method of computing 7 by first reducing the scores is usually 
superior to the method of applying formula (25) or (26) directly to 
the raw scores. This is because we deal with smaller whole numbers, 
and much of the arithmetic can be done mentally. When raw scores 
have more than two digits, they are cumbersome to square and multi- 
ply unless reduced. The student should note that instead of 65 and 


4 
be found by adding to UE and the amounts by which the X and 
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25 other constants might have been used to reduce the X and Y 
scores. If the smallest X and Y scores had been subtracted, namely, 
50 and 22, all of the X^ and Y" would, of course, have been positive. 
This is an advantage in machine calculation but these reduced scores 
would have been somewhat larger numerically than are the reduced 
scores in Table 21. In general, the best plan in reducing scores is to 
subtract constants which are close to the means. The reduced scores 
are then both plus and minus, but are numerically about as small as 
we can make them. 


(3) THE CALCULATION OF r BY THE DIFFERENCE-FORMULA 


It is apparent from the preceding sections that the product-moment 
formula for r may be written in several ways, depending upon 
whether deviations are taken from actual or assumed means and 
upon whether raw scores or deviations are employed. The present 
section contributes still another formula for calculating r—namely, 
the difference-formula. This formula will complete our list of expres- 
sions for r, as it is believed that the student who understands the 
meaning and use of the correlation formulas given in this chapter will 
have no difficulty with other variations which he may encounter,” 

The formula for r by the difference method is 


Ir? + Xy? — Xd? 
уху 


(coefficient of correlation by difference-formula, deviations 
from the means of the distributions) 


in which Xd? = Z (x — y)?. 

The principal advantage of the difference-formula is that no cross 
products (xy’s) need be computed. For this reason, this formula is 
employed in several of the printed correlation charts, Formula (27) 
is illustrated in Table 22 with the same data used in Table 19 and 
elsewhere in this chapter. Note that the x, y, z?, and у? columns 
repeat Table 19. Тһе d or (z — y) column is found by subtracting 
algebraically each y-deviation from its corresponding z-deviation. 
These differences are then squared and entered in the d? or (z — y)? 
column, Substitution of Xz*, Zy?, and Xd? in formula (27) gives 
r=.78. 


(27) 


r= 
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TABLE 22 To illustrate the calculation of г from ungrouped data by the 
difference-formula, deviations from the means 


Test 1 Test 2 d 

Subject X Y т y (2-0) т y (x — у)? 
A 50 22 -125 —84 -41 156.25 70.56 16.81 
B 54 2 -85 -54 -81 7225 29.16 9.61 
с 56 34 -65 3.0 -101 42.25 1296 102.01 
р 59 28 —85 —24 —11 1225 5.76 1.21 
E 60 26 —25 —44 1.9 6.25 19.36 3.61 
F 62 30 -5 -4 = 1 .25 16 91 
с 61 82 — 1.5 16 — 8.1 2.25 2.56 9.61 
н 65 30 25 -4 29 6.25 16 8.41 
І 67 28 45 -24 6.9 2025 5.76 47.61 
J 71 84 8.5 3.6 40 7225 12.6 24.01 
к 71 86 8.5 5.6 29 7225 3136 8.41 
L 74 40 1L5 9.6 1.9 13225 92.16 3.61 

595.00 „92 234.92 
a nd 595.00 + 282. 234. 
бта + 282.92 .92 (27) 
23595 x 282.92 
My = 30.4 
= 78 


Another form of the difference-formula is often useful, especially 
in machine calculation. This version makes use of raw or obtained 
Scores: 


ж N[XX? J- ZY? — X(X — Y)?] - 2(EX) X (ZY) 
2/[NEX? — (УХ) (NEY? — (ZY)] 
(coefficient of correlation by difference-formula, calculation 
from raw or obtained scores) 


(28) 


in which У(Х — Y)? is the sum of the squared differences between 
the two sets of scores. 


3. Averaging coefficients of correlation 


It has been a fairly common practice to average correlation coeffi- 
cients computed from tests given to comparable groups in order to 
obtain a generalized picture of the relationship between the two vari- 
ables. The averaging of r's is, however, a dubious and often an 
incorrect procedure. In the first place, 775 do not vary along a linear 
scale so that the increase from .40 to .50 does not mean the same in- 
crease in relationship as does an increase from .80 to .90. Secondly, 
when -+7’s and —r's are averaged, they tend to cancel each other out. 
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If r’s do not differ greatly in size, their arithmetic mean will yield a 
useful result; but this is not true when r's differ widely in size or in 
sign. Averaging an r of .70 and anr of .60 to obtain .65 is permis- 
sible; but averaging ап r of .90 and апт of .10 to obtain .50 is not. 

The safest plan is not to average r's at all. When for various rea- 
sons averaging seems to be demanded by the problem, the best 
method is to transform the r’s into Fisher’s z-function (p. 426), and 
take the arithmetic mean of the 28. The mean 2 may then be con- 
verted into an equivalent 7. An example will illustrate the procedure 
to be followed in converting 7’s to z's. 


Example (1) In 5 parallel experiments the following r’s are ob- 
tained between the same two variables: .50, 90, .40, .30, and .70. 
What is the mean of these coefficients ? 


By Table С we may convert these 778 into the following 28: .55, 
1.47, 42, 31, and .87. The mean of these z’s is .72, which is equivalent 
to an r of .62. Comparison of this mean r with .56, the average of 
the r's as they stand, gives an idea of the correction effected in 
using 2. 


PROBLEMS 


1. Find the correlation between the two sets of scores given below, using 
the ratio method (p. 126). 


Subjects X Y 
а 15 40 
b 18 « 42 
е 22 50 
а 17 45 
е 19 43 
f 20 46 
g 16 4l 
h 21 4l 


2. The scores given below were achieved upon Army Alpha and Type- 
writing Tests by 100 students in a typewriting elass. The typewriting 
scores are in number of words written per minute, with certain penal- 
ties. Find the coefficient of correlation. Use an interval of 5 units for 
Y and an interval of 10 units for X. 
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Typing (Y) Alpha (X) Typing (Y) Alpha (X) Typing (Y) Alpha (X) 


46 
31 
46 
40 
42 
41 
39 


152 

96 
171 
172 
138 
154 
127 
156 
156 
133 
173 
134 
179 
159 
167 
136 
153 
145 
134 
184 
154 


164 
127 
144 
160 
106 

95 
146 
175 
126 
120 
154 
146 
154 
159 
175 
164 
11 
164 
119 
160 
149 
149 
143 
159 
157 
153 
149 
163 
175 
133 
178 
168 
156 


40 
36 
43 
48 
45 
58 
23 
45 
44 
47 
29 
46 
46 
39 
49 
34 
41 


120 
140 
141 
143 
138 
149 
142 
166 
138 
150 
148 
166 
146 
167 
139 
183 
150 
179 
138 
136 
172 
145 
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3. In the correlation table given below compute the coefficient of correla- 
tion. 


Boys: Aces 4.5 то 5.5 YEARS 
Weight in Pounds (X) 


29-33 | 34-38 | 39-43 | 44-48 


45-47 
42-44 
39-41 
36-38 | 
33-35 
30-32 
"Totals 


Э 
E 
4 
ds 
i 
E 


4. In the following correlation table compute the coefficient of correlation. 
Army Alpha I.Q.'s 


School 115-120- 
Marks 119| 124 


90 and over 
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5. Compute the coefficient of correlation between the Algebra Test scores 
and I.Q.s shown in the table below. 


ALGEBRA 'ТЕ8Т Scores 


6. Compute the correlation between the two sets of scores given below 


(a) when deviations are taken from the means of the two series [use 
formula (24) ]; 

(b) when the means are taken at zero. First reduce the scores by sub- 
tracting 150 from each of the scores in Test 1, and 40 from each of 
the scores in Test 2, 


Test 1 Test 2 Test 1 Test 2 
150 60 139 41 
126 40 155 43 
135 45 147 37 
176 50 162 58 
138 56 156 48 
142 43 146 39 
151 57 133 31 
163 38 168 46 
137 41 153 52 
178 55 150 57 


7. Find the correlation between the two sets of memory-span scores given 
below (the first series is arranged in order of size) (a) when deviations 
are taken from assumed means [formula (22)], (b) by the difference- 
method given on page 145. 


Test 1 
(digit span) 
15 
14 
13 
12 
11 


ANSWERS 


мәз 


хы 


ІІІ 


мый 
arn 
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Test 2 
(letter span) 
12 
14 
10 
8 
12 
9 
12 


REGRESSION AND PREDICTION 


D 


+ 


І.Тһе Regression Equations 


1. The problem of predicting one variable from another 


Suppose that in a group of 120 college students (p. 129), we wish 
to estimate a certain man's height knowing his weight to be 153 
pounds. The best possible “guess” that we can make of this man’s 
height is the mean height of all of the men who fall in the 150-159 
weight-interval. In Figure 39 the mean height of the nine men in 
this column is 68.9 inches, which is, therefore, the most likely height 
of a man who weighs 153 pounds. In the same way, the most prob- 
able height of a man who weighs 136 pounds is 66.6 inches, the mean 
height of the thirty-seven men who fall in weight-column 130-139 
pounds. And, in general, the most probable height of any man in the 
group is the mean of the heights of all of the men who weigh the same 
(or approximately the same) as he, i.e., who fall within the same 
weight-column. 

Turning to weight, we can make the same kind of estimates. Thus, 
the best possible “guess” that we can make of a man’s weight know- 
ing his height to be 66.5 inches is 135.1 pounds, viz., the mean weight 
of the thirty-three men who fall in the height-interval 66-67 inches. 
Again, in general, the most probable weight of any man in the group 
is the mean weight of all of the men who are of the same (or approxi- 
mately the same) height. 

Our illustration shows that from the scatter diagram alone it is 
possible to “predict” one variable from another. But the prediction 
is rough, and is obviously subject to a large “error of estimate.” * 


* See page 161. 
152 
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Weight in Pounds (X) 
100- 110- 120- 130- 140- 150. 160- 170- 


109 119 129 189 149 159 169 179 
MEC 7 Sy 
ЫЕ | me 
+ 
1 
3; |з 4 37] 16 
Xx 
1 
ш 6 3 2 2 |28 
--8-4-2---.-.---. өз 
11 
| 
wi || 3 26 
I 
1112 13 
1 
60-61] 1 1 H 8 


gium 43877 e iy cy "Ww dé 


FIG. 39 Illustrating positions of regression lines and calculation of the 
regression equations (See Fig. 38, p. 135.) 


r= 60 For plotting on the chart, regression 
Mx — 1363 pounds equations are written with o, and оу 
Мұ = 66.5 inches in class-interval units, viz.— 
y ia see 
z= Л1у\ р. 154. 
Caleulation of Regression Equations 
I. Deviation Form 
= 262 
= { rE 1 
(1) y= 60x 1554 zr Ox (29) 
(2) 2= 60x By = 3.56y (30) 
П. Score Form 
а) Ү-665 -10(Х- 1363) or Y = 10X +529 (31) 
(2) X — 1363 = 3.56(Ү — 66.5) or X =3.56Y — 1004 (32) 
Calculation of Standard Errors of Estimate 
Gest. n = 2.62\/1 — 60° = 2.10 inches (33) 
Oest. n = 1554 VI — 60° = 1243 pounds (34) 


Moreover, while we have made use of the fact that the means are the 
most probable points in our arrays (columns or rows), we have made 
no use of our knowledge concerning the over-all relationship between 
the two variables. The two regression lines in Figure 39 are deter- 
mined by the correlation between height and weight and their degree 
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of separation indicates the size of the correlation coefficient * (p. 131). 
Consequently, they deseribe more regularly, and in a more general- 
ized fashion than do the series of short straight lines joining the 
means, the relationship between height and weight over the whole 
range (see also p. 153). A knowledge of the equations of these lines is 
necessary if we are to make a predietion based upon all of our data. 
Given the weight (X) of а man comparable to those in our group, 
for example, if we substitute for X in the equation connecting Y and 
X we are able to predict this man's height more accurately than if 
we simply took the mean of his height array. The task of the next 
section will be to develop equations for the two regression lines by 
means of which predictions from X to Y or from Y to X can be made. 


2. The two regression equations in deviation form 


The equations of the two regression lines in a correlation table rep- 
resent the straight lines which “best fit" the means of the successive 
columns and rows in the table. Using as a definition of “best fit" the 
criterion of “least squares," + Pearson worked out the equation of 
the line which goes through, or as close as possible to, more of the 
column-means than any other straight line; and the equation of the 
line which goes through, or as close as possible to, more of the row- 
means than any other straight line. These two lines are “best fitting" 
in a mathematical sense, the one to the observations of the columns 
and the other to the observations of the rows. 

The equation of the first regression line, the line drawn to repre- 
sent the trend of the crosses in Figure 39, is as follows: 

у=" xz (29) 
б, 
(regression equation of y on x, deviations taken from. 
the means of Y and X) 


Oy. қ 4 5 
Тһе factor тА is called the regression coefficient, and is often re- 
г 
‚ * The term "regression" was first used by Francis Galton with reference to the 
inheritance of stature. Galton found that children of tall parents tend to be 
less tall, and children of short parents less short, than their parents. In other 
words, the heights of the offspring tend to *move back" toward the mean 
height of the general population. This tendency toward maintaining the “mean 
height” Galton called the principle of regression, and the line describing the 
relationship of height in parent and offspring was called a “regression line.” 
The term is still employed, although its original meaning of “stepping back” to 
some stationary average is not necessarily implied (see p. 171). 

+ For an elementary mathematical treatment of the method of least squares 
аз applied to the problem of fitting regression lines, see Walker, Н. М., Ele- 
mentary Statistical Method (New York: Henry Holt and Co., 1943), pp. 308- 
310. · 
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placed in (29) by the term byz or Біз so that formula (29) may be 
written y = bye X x, or = bi» X x. The bar over the (7) means that 
our estimate is an average value. 

If we substitute in formula (29) the values of т, бу, and б,, ob- 
tained from Figure 39, we have 


8 2.62 
y = .60 
E 


z, or y = 10x 


This equation gives the relationship of deviations from mean height 
to deviations from mean weight. When x = +1.00, 7 = +.10; and 
a deviation of one pound from the mean of the X’s (weight) is accom- 
panied by a deviation of .10 inch from the mean of the Y's (height). 
Тһе man who stands one pound above the mean weight of the group, 
therefore, is most probably .10 inch above the mean height. Since 
this man’s weight is 137.3 pounds (136.3 + 1.00), his height is most 
probably 66.6 inches (66.5 -- .10). Again, the man who weighs 120 
pounds, i.e., is 16.3 pounds below the mean of the group, is most 
probably 64.9 inches tall—or about 1.6 inches below the mean height 
of the group. To get this last value, substitute т = —16.3 in the equa- 
tion above to get y = —1.63, and refer this value to its mean. The 
regression equation is a generalized expression of relationship. It 
tells us that the most probable deviation of an individual in our group 
from the Mj, is just .10 of his deviation from the Мы. 


The equation y = "ах x gives the relationship between Y and X 


in deviation form. This designation is appropriate since the two 
variables are expressed as deviations from their respective means 
(i.e., as x and y) ; hence, for a given deviation from M x the equation 
gives the most probable accompanying deviation from My. 

The equation of the second regression line, the line drawn through 
the circles (i.e., the means) of the rows in Figure 39, is 


т=т--Ху (30) 


(regression equation of x on y, deviations taken from 
the means of X and Y) 


As in the first regression equation, the regression coefficient s 


is often replaced by the expression bzy or bz; and formula (30) wit 
ten 9 = bry X y or = ba X y. 
If we substitute for r, o,, and бу, in formula (30), we have 


15.54 


a= .60X ag Uu ea 
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from which it is evident that a deviation of 1 inch from the Му, or 
from 66.5 inches, is accompanied by a deviation of 3.56 pounds from 
the M, or from 136.3 pounds. Expressed generally, the most prob- 
able deviation of any man from the mean weight is just 3.56 times 
his deviation from the mean height. Accordingly, a man 67 inches 
tall or .5 inch above the mean height (66.5 + .5 = 67) most probably 
weighs 138.1 pounds, or is 1.8 pounds above the mean weight 
(136.3 + 1.8). (Substitute y = .5 in the equation and x = 1.8). 


Equation z = r%X y gives the relationship between X and Y in 


deviation form. That is to say, it gives the most probable deviation 
of an X-measure from Мұ corresponding to а known deviation in the 
Y-measure from My. 

Although both regression equations given above involve x and y, 
the two equations cannot be used interchangeably—neither can be 
used to prediet both z and y. This is an important fact which the 
student must understand clearly and constantly bear in mind. The 


first regression equation у = т X x can be used only when y is to be 
6. 


2 
predieted from a given z (when y is the “dependent” variable) *; 
while the second equation z = rex у сап be used only when z is to 


y А 
be predicted from a known y (when z is the "dependent" variable). 
There are always two regression equations in a correlation table, 
the one through the means of the columns and the other through the 
means of the rows, unless the correlation is 1.00 or —1.00. When 
т = 1.00, j = r9! X т becomes 7 = ŽL X x or Yor = toy. Also, when 


2 » 


т = 1.00, 2 = ^9 < у becomes 2 = 22x у or Zo, = Yos. In short, 


бу y 
when the correlation is perfect, (+1.00), the two equations are iden- 
tical and the two regression lines coincide. To illustrate this situa- 
tion, suppose that the correlation between height and weight in Fig- 
ure 39 were perfect. The first regression equation would then be 


c= 2.62 т aS 15.54 
= 1.00 X x or y = .172, and the second, т = 1.00 < 29-9 *y, or 
i iM о 5 А X382 
т = 5.93y. Algebraically, the equation x = 5.93y is equal to y = .172; 


for if we put = i x = 5.93у. When т = +1.00 there is only one 


* The dependent variable takes its value from the other (independent) vari- 


able in the equation. For example, in the equation y = 52-10, y “depends” 
for its value upon т; hence у is the dependent variable. 
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equation and a single regression line. Moreover, if r = +1.00, and in 
addition 6, = бу, the single regression line makes an angle of 45° or 
135° with the horizontal axis, since y = +2. 


3.. Plotting the regression lines in a correlation table * 


In Figure 39, the coórdinate axes have been drawn in on the corre- 
lation table through the means of the X- and Y-distributions. The 


FIG. 40 Plot of the straight line, у = 2x 


* A brief review of the equation of a straight line, and of the method of 
plotting a simple linear equation is given here in order to simplify the plotting 
of the regression equations. 3 

In Figure 40, let X and Y be codrdinate axes, or axes of reference. Now 
suppose that we are given the equation y — 2z and are required to represent 
the relation between z and y graphically. To do this we assign values to z in 
the equation and compute the corresponding values of y. When z — 2, for ex- 
ample, y —2 X 2 or 4; when т = 3, y = 2X 3 or 6. In the same way, given any 
z-value we can compute the value of y which will “satisfy” the equation, that 
is, make the left side equal to the right. If the series of z and у values found 
from the equation are plotted on the diagram with respect to the X- and 
Y-coürdinates (as in Fig. 40) they will be found to fall along a straight line. 
This straight line pictures the relation y = 2z. It goes through the origin, since 
when т = 0, y = 0. The equation у = 2z represents, then, a straight line which 
passes through the origin; and the relation of its coórdinates (points lying along 


the line) is such that, called the slope of the line, is always equal to 2. 


The general equation of any straight line which passes through the origin may 
be written y = mz, where m is the slope of the line. If we replace m in the 


general formula by r% it is clear that the regression line in deviation form, 
namely, y = r2'z, is CES the equation of a straight line which goes through 
the origin. Ө the same reason, when the general equation of a straight line 
through the origin is written z — my, z — тағу is also seen to be а straight line 


through the origin, its slope being тоа. 
Й 
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vertical axis is drawn through 136.3 pounds (Мы), and the horizon- 
tal axis through 66.5 inches (М). These axes intersect close to the 
center of the chart. Equations (29) and (30) define straight lines 
which pass through the origin or point of intersection of these co- 
ordinate axes. It is a comparatively simple task to plot in our regres- 
sion lines on the correlation chart with reference to the given coórdi- 
nate axes. 

Correlation charts are usually laid out with equal distances repre- 
senting the X and Y class-intervals (the printed correlation charts 
are always so constructed) although the intervals expressed in terms 
of the variables themselves may be, and often are, unequal and in- 
commensurable. This is true in Figure 39. In this diagram, the inter- 
vals in X and Y appear to be equal, although the actual interval for 
height is 2 inches, and the actual interval for weight is 10 pounds. 
Because of this difference in interval-length in the two variables it 
is very important that we express с, and c, in our regression equa- 
tions in class-interval units before plotting the regression lines on the 
chart. Otherwise we must equate our X and Y intervals by laying 
out our diagram in such a way as to make the X-interval five times 
the Y-interval. This latter method of equating intervals is imprac- 
tical, and is rarely used, since all we need do in order to use correla- 
tion charts drawn up with equal intervals is to express б, and бу in 
formulas (29) and (30) in units of interval. When this is done, and 
the interval, not the score, is the unit, the first regression equation 
becomes 


= 60755707 = 51x 
and the second E 
t= ot y or = 71у 


Since each regression line goes through the origin, only one other 
point (besides the origin) is needed in order to determine its course. 
In the first regression equation, if т = 10, у = 5.1; and the two points 
(0, 0) and (10, 5.1) locate the line. In the second regression equa- 
tion, if y = 10, = 7.1; and the two points (0, 0) and (7.1, 10) deter- 
mine the second line. In plotting points on a diagram any convenient 
scale may be employed. A millimeter rule is useful. 

It is important for the student to remember that when the two o’s 
are expressed in interval units, regression equations do not give the 
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relationship between the X and Y score deviations. These special 
forms of the regression equations should not be used except when 
plotting the equations on a correlation chart. Whenever the most 
probable deviation in the one variable corresponding to a known 
deviation in the other is wanted, formulas (29) and (30), in which 
the o's are expressed in score units, must be employed. 


4. The regression equations in score form 


Tn the last sections it was pointed out that formulas (29) and (30) 
give the equations of the regression lines in deviation form—that 
values of z and y substituted in these equations are deviations from 
the means of the X and Y distributions, and are not scores. While 
the equations in deviation form are actually all that one needs in 
order to pass from one variable to another, it is decidedly convenient 
to be able to estimate an individual's actual score in Y, say, directly 
from the score in X without first converting the X-score into a devia- 
tion from My. This сап be done by using the score form of the 
regression equation. The conversion of deviation form to всоге form 
is made as follows: Denoting the mean of the Y's by My and any 
Y-score simply by Y, we may write the deviation of any individual 
from the mean as Y — My or, in general, y — Y — My. In the same 
way, x = X — Mx when z is the deviation of any X-score from the 
mean X. If we substitute Y — My for y, and X — My for 2, in for- 
mulas (29) and (30), the two regression equations become 


Y — My =r% (X — Mx) or 


Ү- өші. (X — Mx) + My (31) 
and 
X —Mz=r%(Y-— My) or 
Oy 
X =r% (Y — My) - Mx (32) 
v 


(regression equations of Y on X and X. on Y in score form) 


"These two equations are said to be in score form, since the X and Y in 
both equations represent actual scores, and not deviations from the 
means of the two distributions. 
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If we substitute in (31) the values of My, т, оу, op, and M x obtained 
from Figure 39, the regression of height on weight in score form 
becomes 
eu 2.62 
Y= 60 X155, 


or upon reduction 


X — 136.3) + 66.5 


Y = 10Х + 52.9 


To illustrate the use of this equation, suppose that а man in our 
groups weighs 160 pounds and we wish to estimate his most probable 
height. Substituting 160 for X in the equation, Y — 69 inches: and 
accordingly, the most probable height of a man who weighs 160 
pounds is 69 inches. 

If the problem is to predict weight instead of height, we must use 
the second regression equation, formula (32). Substituting for Mx, 
Т, Oz, бу, and My in (32) we have 


15.54 
X = 60 X — (Y — 66.5) + 136.3 
X oT M 


or 
X = 3.56Y — 1004 


Now if a man is 71 inches tall, we find, on replacing Y by 71 in the 
equation, that X = 152.4. Hence the most probable weight of a man 
who is 71 inches tall is about 15212 pounds. 


5. The meaning of a "prediction" from the regression equation 


It may seem strange, perhaps, to talk of "predicting" a man's 
height from his weight, when the heights and weights of the 120 men 
in our group are already known. When we have measures of both 
height and weight it is unnecessary, of course, to estimate one from 
the other. But suppose that all we know about a given individual is 
his weight and the fact that he falls within the age-range of our group 
of 120 men. Since we know the correlation between height and 
weight to be .60, it is possible from the regression equation to predict 
the most probable height of our subject in lieu of actually measuring 
him. Furthermore, the regression equation may be employed to 
estimate the height of any man in the population from which our 
group is chosen, provided our sample is an unbiased selection from 
the larger group. A regression equation holds, of course, only for the 
population from which the sample group was drawn. We cannot 
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estimate the heights of children or of women from a regression equa- 
tion which describes the relationship between height and weight in 
men between the ages of eighteen and twenty-five years (the age- 
range of the students in our group). Conversely, we cannot expect а 
regression equation established for elementary-school children to 
hold for older groups. 

Height and weight, since they are both easily measured, perhaps 
do not demonstrate the value of the regression equation so clearly as 
do other and more complex traits. These variables were chosen for 
our “model” problem because they are objective and observable and 
their meaning is definite. Let us now consider a problem of more 
direct psychological interest. Suppose that in a group of 300 high- 
school children of nearly the same age, the correlation between a 
group intelligence test given at the beginning of the school year and 
average grades made in the first year of high school is .60. Now if we 
administer the group test to a child who enters school the next year, 
it is possible from his score to estimate his probable scholastie per- 
formance by means of the regression equation between test score and 
grades obtained from the previous years' class. Forecasts of this 
sort are useful in educational prognosis and guidance." The same 
is true of vocational guidance; we are often able to predict from a 
test battery the probable suecess of an individual who contemplates 
entering a certain trade or profession. Advice on such a basis is 
1neasurably better than subjective judgment. 


ll. The Reliability of Predictions + 


1. The standard error of estimate 


The values of X and Y “predicted” from regression equations have 
been constantly referred to as being the “most probable” values of 
the one variable accompanying the given value of the other. In order 
to show just how probable such estimates are it is necessary that we 
calculate their standard errors of estimate. The accuracy with which 
we are able to predict Y-scores from equation (31) is given by the 
formula 


* pnmon H. A., Academic Prognosis in the University, Educational Psy- 
chology Mono; raphs, 1930, 27. 

t Stead, W. E and Shartle, C. L., Occupational Counseling Techniques (New 
York: American Book Co., 1940). 

t This section may be omitted until after Chapter 8. 
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бон, y) = Gy V1 — r° (33) 
[standard error of a Y-score predicted from equation (31) ] 


in whieh c, is the c of the Y distribution, and r is the coefficient of 
correlation. The subscript “est.” is used to distinguish this standard 
error from the o of the distribution. 

From formula (31) we have calculated the most probable height 
of a man weighing 160 pounds to be 69 inches. The reliability of this 
prediction is obtained by substituting oan and r in formula (33) 
to find 


бом. y) = 2.62\/1 — .60° = 2.1 inches 


We now say that the most probable height of a man weighing 160 
pounds is 69 inches with a diest.) of 2.1 inches; and that the chances 
are about two in three that our predietion does not miss the man's 
actual height by more than +2.1 inches. We шау feel quite certain 
that the estimated height of this man does not miss his true height 
by more than 5-364, or by more than 6.3 inches (p. 185). 

The degree of accuracy with which X-scores can be predicted from 
(32) is given by the formula 


Gest, x) = G2\/1 — 1 (34) 


[standard error of ап X-score predicted from equation (32) | 


in which о, is the о of the X distribution, and ғ is the coefficient of 
correlation. 

We found on page 160 that the most probable weight of a man in 
our group who is 71 inches tall is 152.4 pounds. The drest.) of this 
predietion from (34) is 


O (ost, ху = 15.54\/1 — .60? = 12.4 pounds 


and the most probable weight of ату man 71 inches tall, in our group 
or in the population from which our sample was drawn, is 152.4 
pounds with а Oest.) of 12.4 pounds. The chances, therefore, are 
about two in three that our predietion does not miss our man's true 
weight by more than +12.4 pounds. 


2. The accuracy of individual predictions from regression equations 


The formulas for бо.) measure the error made in taking pre- 
dicted, instead of actual, X and Y measures. If r = 1.00, VI — т? is 
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0, and Giest.) is zero—there is no error of estimate and each person's 
measurement is predicted exactly. On the other hand, when r — .00, 
М1 — 7? = 1.00, and the error of estimate is equal to the c of the 
distribution into which prediction is made. When this last situation 
occurs, the regression equation is of no value in enabling us the better 
to predict scores, as each person's most probable score (e.g., X) is 
simply the mean (i.e., Mx). When т = .00 all that we can say defi- 
nitely is that a subject’s score lies somewhere in the distribution of 
Y’s or X’s. But just where we cannot tell, since our SH * of estimate 
equals the SD of the test. 

It is clear from formulas (33) and (34) that the accuracy of pre- 
diction from a regression equation depends directly upon the с 
of the distribution (су or 6,) and upon the degreé of correlation 
between the two sets of measures. If the variability (cy) of Y is 
small, and the correlation between Y and X high (e.g., 90), values.of 
Y can be predicted from known values of X with a comparatively 
high degree of accuracy. However, when the variability of a test 16% 
large, or the correlation low (or when both conditions exist), predie- 
tion from regression equations becomes so unreliable as to be almost 
valueless. Even when the correlation is fairly high, forecasts will 
often have an uncomfortably large error of estimate. Thus we have 
seen that in spite of the т = .60 between height and weight (Fig. 39), 
our forecast of a man's weight, knowing his height, has а бы. x) of 
about 12 pounds (p. 162). Predicted heights will, in two-thirds of the 
cases, be in error by not more than 2 inches. An example in which 
high correlation offsets fairly large variability, permitting reasonably 
accurate forecasts, is given later in Figure 41, page 169. 

When an investigator uses the regression equations for purposes 
of prediction, he should always give the drest.) of his estimated scores. 
The value of a forecast depends, first of all, upon the size of the error 
of estimate; but it also depends upon the units of measurement, and 
upon the purposes for which the prediction is made (p. 186). 


3. The accuracy of group predictions 


We have seen that the standard error of a predicted score 
G(«c) May often be uncomfortably large. Only when т = 1.00 is 
МЇ — ri = .00, and only then can an estimate be made without error. 
The correlation coefficient must be .87 before /1 — 7? is .50, i.e., 
before the standard error of estimate is reduced 50% below the o of 

* SE = standard error. 
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the test. Obviously, unless r is quite large (larger than we usually 
get in practice) the regression equation is of little aid in forecasting 
with reasonable accuracy what a given individual may be expected 
to do (p. 162). This fact has led many to discount unwisely the value 
of correlation in prediction and to conclude that the calculation of r 
is not worth the trouble. 

Fortunately correlation makes out better in forecasting the per- 
formance of groups than in predicting the most likely achievement 
of a given individual. In forecasting achievement the psychoiogist 
is in much the same position as the insurance statistician or actuary. 
The actuary cannot tell how long John Smith, aged twenty, will live. 
But from his tables, he can tell quite accurately how many of 10,000 
men now aged twenty will live to be thirty, forty, or fifty years old. 
In the same way, the psychologist may be quite uncertain concern- 
ing the performance of a given individual. But knowing the correla- 
tion between a test (or test battery) and some criterion of perform- 
ance, he can forecast, often with considerable accuracy, the probable 
performance of various groups chosen from his distribution of test 
scores. The degree of accuracy in such predictions depends upon the 
size of the correlation coefficient. 

To illustrate “actuarial” prediction in psychology, suppose that 
70% of a freshman class of 400 men achieve grades in their college 
work above the minimum passing mark and hence are regarded as 
“satisfactory” students. Suppose, further, that the correlation be- 
tween a standard intelligence test and freshman performance is .50. 
Now if we had selected the upper half of our group (i.e., the 200 stu- 
dents who performed best on the intelligence test) at the beginning 
of the term, how many of these 200 would have been “satisfactory,” 
i.e. in the upper 70% of the grades distribution? From Table 23 it 
can easily be read that 84% of our 200 selected freshmen (i.e., 168) 
should be found in the satisfactory group with respect to grades. 
The entry .84 is found in column .50 (percentage of test distribution 
chosen) opposite the correlation of .50. This result should be com- 
pared with the 70% (1.е., 140) who might be expected to fall in the 
satisfactory group when selection is by “guess,” without knowledge 
of the correlation. This entry is in column .50 opposite the r of .00. 

The probable performance of other and smaller groups chosen 
from our test distribution can be estimated with much greater 
accuracy from Table 23.. We know, for example, that 91% of the 
best 20% of our students (roughly, seventy-three in the first eighty) 
ean be expected to prove satisfactory in terms of our criterion (1.6.. 
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TABLE 23%  Proportion of students considered SJ in terms of 
grades — .70 


Selection Ratio: Proportion Selected on Basis of Tests 


B (5 10 20 .30 40 .50 (60 .70 .80 .90 .95 
00 40 .70 19 .70 .70 .70 .70 .70 .70 .70 ло 
05 SONETS ML T2 дї Ll WE ЛІ Лу 270 570 
10 SE 6. 55784 274: 5780178 099. 72771-71 70 
15 580549) 41,576. о Ар ,42. 228,278. 271. 271 
20 ӨЗ КӘРТӘ Ма „Г ТО 49, A ЛЗ 271. 21 

25 ВОЗЕ ДҮ НО iB 5477 WO. ЛЫ 8 272 .71 
80 88 .56 .4 .82 .80 .78 77 .75 74 72 Л1 
.85 91 .59 ,46 498 42 40 „78 .76 75 .78. .7l 
40 0d SOT ТӨЗІРІЕр” (82) BI 79 574 175. 78 472 
AS 4 .9 90 87 .85 .8 81 .78 .76 .73 .72 
50 6 М 91 89 .7 .4 .82 .80 .77 .74 174 
.55 97 706 98-91 .38 .80 ,88 81 .78. 74 72 
.60 98 .7 95 92 .90 .7 .85 .82 79 175 .78 
.65 .99. 98 .6 .4 .92 89 .86 .83 .80 .75 .72 
.70 100 .9 .97 .6 .93 01 88 .84 .80 .76 .78 
75 100 1.00 .98 97 .05 .02 .89 .86 .81 .76 .73 
80 100 100 .99 .98 97 94 .91 .87 82 77 48 
85 100 1.00 100 .99 .98 .96 .93 .89 .84 77 .74 
.90 100 1.00 1.00 1.00 .99 .98 .95 .91 .85 .78 .74 
.95 1.00 1.00 1.00 1.00 1.00 .94 .86 .78 .74 
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 .88 .78 .74 


being loeated in the upper 7096 of the grade distribution). Read the 
entry .91 in column 20 opposite r — .50. If the correlation of the 
intelligence test and school grades had been .60 instead of .50, 87% 
(174 in 200) of the “best half" according to the test would have been 
satisfactory students; and 95% of the “best” 20% on the test should 
be satisfactory students. These forecasts are to be compared with 
7096, the estimate when r — .00. It is clear that a knowledge of the 
correlation greatly improves the estimate, and the larger the » the 
better the forecast. 

ТаМе 23 is a small part of a larger table in which "proportions 
considered satisfactory in achievement" range from .05 to .95. Тһе 
correlation between test score and performance ranges from .00 to 
1.00. These tables are strictly accurate only when the distributions 
are normal both in the test and in the criterion of performance. They 
may be used with considerable confidence when the distributions are 
approximately normal, especially when the N’s are large; and in any 
ease they furnish useful information. 

* Taylor, Н. C., and Russell, J. T., “The Relationships of Validity Coeff- 


cients to the Practical Effectiveness of Tests in Selection: Discussion and 
Tables,” Journal of Applied Psychology, 1939, 23, 565-578. 
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Forecasting tables have considerable value in selecting personnel 
for business or other vocations. First, we must determine what pro- 
portion of a given group of workers is to be considered "successful." 
With this information in hand and knowing the correlation between 
our test battery and performance in the given activity, we may fore- 
cast the probable suecess of groups of new applicants from their test 
scores. Assume, for example, that 70% of a group of factory workers 
are regarded as “acceptable workers,” acceptability having been 

» determined from ratings by foremen, number of pieces done in a 
given time, or time taken to complete certain standard jobs. Assume, 
further, that a test battery has a correlation of .45 with worker- 
performance, Then if we select the best twenty out of 100 applicants 
(“best” according to our tests), we find from Table 23 that 90% of 
this number or eighteen should be acceptable workers. If we had had 
no test and had simply selected the first twenty applicants to appear 
—or any twenty—70% or fourteen should be acceptable. Use of the 
tests improves our forecast 30% ; and the more stringent the criterion 
of acceptability the greater the improvement in forecast made by the 
tests. 


lll. The Effect of Variability of Scores upon the Size of r 


Suppose that the correlation between two tests in a group of 50 
sixth-grade children has been found to be .50. How will this correla- 
tion compare with that between the same tests in a group of greater 
range, e.g., a group of 200 children spread over grades 6, 7, and 8? 
More generally, knowing the correlation between two tests in a group 
of narrow range of talent, can we predict the probable correlation in 
a group of wider range of talent? 

The problem of the effect upon r of the “range of talent” (size of 
6, and cy) within the group being studied often arises in correlational 
work. It becomes important, for example, when one wishes to go 
beyond the correlation obtained in the sample with which one is 
working and to generalize (estimate the r) for a group of wider 
range; or when 778 between the same tests obtained in different 
ranges are to be compared. A formula for estimating the correlation 
between two tests in a heterogeneous group when we know the cor- 
relation between the tests in a homogeneous group may be developed 
in the following way: Let o( y) be the standard error of estimate 
in a group somewhat curtailed in variability or in range of talent; 
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and бе, y,,) be the standard error of estimate in a larger group less 
restricted in variability. (Y is the dependent variable, p. 156.) Then, 
on the assumption that our tests are as effective in the wide as in the 
narrow range, 0 (est, у) = Fest. Yp); ОГ, by formula (33), p. 162, 


oy, V1 — Ponta = O V1 — Teu 


and 
Gy, V1 T Preno 
9w Vl = rey, 


(formula for estimating correlation in a wide range from a 
knowledge of the correlation in a narrow range) 


(35) 


in which оу, is the standard deviation of Y in the group of curtailed 
range; бу, is the standard deviation of Y in the group of uncurtailed 
range; 7,,,, = the correlation in the curtailed group, and Ten, = the 
correlation in the uncurtailed group. 

То illustrate formula (35), suppose that in one group бу, — 10 
and т, у, is .50. What would the r between the same two tests prob- 
ably be in a group in which о,, = 15: in which o,,, is 50% larger 
than c,,? Substituting cy, = 10, бу, = 15, and Tav, = .50 in (35), 
we have 


10 ИЕ А/1- (ERR 
15 \1— 25 


Squaring both sides of this equation, and solving, we have 
Ташу» = 82. The r of .50 in the narrow range becomes ап r of .82 in 
the wide range. It is clear from this example that direct comparison 
of r’s is not valid when the variabilities (o's) within the groups from 
which the r's were computed are quite different. 

1f X and not Y is the dependent variable, formula (35) becomes 


Ses УІ et (36) 
Gn, У1- ann 
(formula for estimating correlation in a wide range from 
a knowledge of the correlation in a narrow range) 


Formulas (35) and (36) are open to the objection that each takes 
account of only one distribution in estimating the probable increase 
in r with increase in range of talent. If, however, the increase 
in c, as the group becomes more heterogeneous is accompanied by a 
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proportional increase іп о, (or vice versa), formulas (35) and (36) 
will give accurate estimates. Experimental trial of these formulas 
has yielded results closely in accord with theoretical expectation.* 


IV. The Solution of a Second Correlation Problem 


The solution of a second correlation problem will be found in Fig- 
ure 41. The purpose of another “model” is to strengthen the reader's 
grasp of correlational techniques by having him work straight 
through the process of calculating r and the regression equations 
upon a new set of data. A student often fails to relate the various 
aspects of a correlational problem when these are presented in ўлесе- 
meal fashion. 


1. Calculation of ғ 


Our first problem in Figure 41 is to find the correlation between the 
LQ.'s achieved by 190 children of the same—or approximately the 
same—chronological age who have taken an intelligence examination 
upon two occasions separated by a six-month interval. The correla- 
tion table has been constructed from a scattergram, as described on 
page 129. The test given first is the X-variable, and the test given 
second is the Y-variable. The calculation of the two means, and of 
Cr, Cy, Oz, and oy covers familiar ground, is given in detail on the 
chart, and need not be repeated here. 

The product-deviations in the z’y’ column have been taken from 
column 100-104 (column containing the AM) and from row 105-109 
(row containing the AMy). The entries in the Хау” column have 
been calculated by the shorter method described on page 137; that is, 
each cell entry in a given row has been multiplied first by its z-devia- 
tion (27) and the sum of these deviations entered in the column Ez". 
Тһе Ez entries were then “weighted” once for all by the y’ of the 
whole row. To illustrate, in the first row reading from left to right 
(1X 5) + (1X 6) or 11 is the Zz’ entry. The х” are 5 and 6, re- 
spectively, and may be read from the x’ row at the bottom of the 
correlation table. Since the common y’ is 5, the final Ха/у” entry is 
55. Again in the seventh row reading down from the top of the 
diagram (5 X —3) + (3 X —2) + (7 X —1) + (16 X0) 4- (2X1) 


* Peters, C. C. and Van Voorhis, W. R., Statistical Procedures and Their 
Mathematical Bases (New York: McGraw-Hill, 1940), pp. 208-212. 
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+ (4X 2) or —18 makes up the Ez entry. The y’ of this row is —1, 
and the final Ez^y' entry is 18. То take still a third example, in the 
eleventh row from the top of the diagram, (1 Ж —5) + (3 X —4) 
+ (1X —3) + (2 X —2) or —24 is the Xx’ entry. The common y’ 
is —5 and the Хуу! entry is 120. 

Three checks of the calculations (see p. 135), upon which т, o; and 
c, are based, are given in Figure 41. Note that fx’ = Ул”; and that, 
when the Zz'y^"s are recalculated, at the bottom of the chart, 
fy’ = Ey', and the two determinations of Ez'y' are equal. When the 
Уау” have been checked, the calculation of r by formula (22) is а 
matter of substitution. Note carefully that c, су, Oz, бу are “ll left 
in units of class-interval in the formula for r (p. 139). 


2. Calculation of the regression equations and the SE's of estimate 


The regression equations in deviation form are given on the chart 
and the two lines which these equations represent have been plotted 
on the diagram. Note that these equations may be plotted as they 
stand, since the class-interval is the same for X and Y (p. 158). In 
the routine solution of a correlational problem it is not strictly neces- 
sary to plot the regression lines on the chart. These lines are often 
of value, however, in indicating whether the means of the X- and 
Y-arrays can be represented by straight lines, that is, whether regres- 
sion is linear. If the relationship between X and Y is not linear, 
other methods of calculating the correlation must be employed 
(p. 371). 

Тһе standard errors of estimate, shown in Figure 41, are 7.83 and 
8.55, depending upon whether the prediction is of Y from X or X 
from Y. АП LQ.'s predicted on the Y-test from X may be considered 
to have the same error of estimate,* and similarly for all predictions 
of X from Y. 

Errors of estimate are most often used to give the reliability of 
specific predicted measures. But they also have a more general inter- 
pretation. Thus a (est. y) of 7.83 points means that 68% of the I.Q.’s 
predicted on test Y from test X may be expected to differ from their 
actual values by not more than +7.83 points, while the remaining 
82% may be expected to differ from their actual values by more than 
+£7.83 points. 


* See, however, Terman, L. M., and Merrill, M. A., Meusuring Intelligence 
(Boston: Houghton Mifflin Co., 1937), pp. 44-47, where the SE's of esti- 
mate have been computed for various 1.0). levels. 
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3. The "regression effect" in prediction 


Predicted scores tend to “move in" toward the mean of the dis- 
tribution into which prediction is made (p. 154). This so-called 
regression effect has often been noted by investigators and is always 
present when correlation is less than 2-1.00.* The regression phe- 
nomenon can be clearly seen in the following illustrations: From the 
regression equation Y = .69X + 32.6 (Fig. 41) itis clear that a child 
who earns an 1.0). of 130 on the first test (X) will most probably 
earn an 1.0. of 122 on the second test (Y) ; while a child who earns 
ап LQ. of 120 in X will most probably score 115 in Y. In both of 
these illustrations the predicted Y-test 1.0. is lower than the first 
or X-test 1.0. Put differently, the second I.Q. has regressed or moved 
down toward the mean of test Y, i.e., toward 102.7 The opposite effect 
occurs when the I.Q. on the X-test is below its mean: the tendency 
now is for the predicted score in Y to move up toward its mean. 
Thus from the equation Y = .69X + 32.6, we find that if a child 
earns ап I.Q. of 70 on the X-test his most likely score on the second 
test (Y) is 81; while an LQ. of 80 on the first test forecasts ап I.Q. of 
88 on the second. Both of these predicted I.Q.’s have moved up 
nearer to the mean 102.7 (i.e., M,). 

Тһе tendeney for all scores predicted from a regression equation 
to pull in—down or up—toward the mean can be seen as.a general 
phenomenon if the regression equation is written in standard-score 
form. Given 


y- r$ Xa (29) p. 154 


г 


if we divide both sides of this equation by o, and write c, under т, 
we have 


т ке 
=P — OF zy = ree (37) 


(regression equation when scores in X and Y are expressed 
as 2 or G-scores) 


In the problem in Figure 41,2, = .762,. If 2,18 +1.000, or +2.00o, 
or 253.000 from M,, Zy will be --.766, --1.526, or +2.286 from М,. 
That is to say, any score above or below the mean of X forecasts a 
Y-score somewhat closer to the mean of Y. 


* Thorndike, В. L., “Regression Fallacies in the Matched Groups Experi- 
ment,” Psychometrika, 1942, 7, 85-102. 


172 * STATISTICS ІМ PSYCHOLOGY AND EDUCATION 


In studying the relation of height in parent and offspring, Galton 
(p. 154) interpreted the phenomenon of regression to the mean to be 
а provision of nature designed to protect the race from extremes. 
This same effect occurs, however, in any correlation table in which т 
is less than +1.00, and need not be explained in biological terms. 
The I.Q.’s of a group of very bright children, for instance, will tend 
upon retest to move downward toward 100, the mean of the group; 
while the I.Q.'s of а group of dull ehildren will tend upon retest to 
move upward toward 100. 


V. The Interpretation of the Coefficient of Correlation 


When should a coefficient of correlation be called “high,” when 
“medium,” and when “low”? Does ап r of .40 between two tests 
indicate “marked” or “low” relationship? How high should an т be 
in order to permit accurate prediction from one variable to another? 
Can an > of .50, say, be interpreted with respect to “overlap” of 
determining factors in the two variables correlated? Questions like 
these, all of which are concerned with the significance or meaning of 
the relationship expressed by a correlation coefficient constantly arise 
in problems involving mental measurement, and their implications 
must be understood before we can effectively employ the correla- 
tional method. 

The value of r as a measure of correspondence may be profitably 
considered from two points of view. In the first place, 7° are com- 
puted in order to determine whether there is any correlation (over 
and above chance) between two variables; and in the second place, 
rs are computed in order to determine the degree or closeness of 
relationship when some association is known, or is assumed, to exist. 
The question, “Is there any correlation between brain weight and 
intelligence?”, voices the first objective. And the question, “How 
significant is the correlation between high-school grades and first- 
year performance in college?”, expresses the second. The problem 
of when an obtained r denotes significant relationship will be con- 
sidered later, on page 197. This section is concerned mainly with 
the second problem, namely, the evaluation—with respect to degree 
of relationship—of an obtained coefficient. The questions at the 
beginning of the paragraph above all bear upon this topic. 
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|. The interpretation of г in terms of verbal description 


Tt is customary in mental measurement to describe the correlation 
between two tests in a general way as high, marked or substantial, 
low or negligible. While the descriptive label applied will vary some- 
what in meaning with the author using it, there is fairly good agree- 
ment among workers with psychological and educational tests that an 


rfrom .00to+ 20 denotes indifferent or negligible relationship; 
r from + .20 {о + .40 denotes low correlation; present but slight; 
rfrom + .40to+ .70 denotes substantial or marked relationship; 
тітош + .70 to + 1.00 denotes high to very high relationship. 


This classification is broad and somewhat tentative, and can only 
be accepted as a general guide with certain reservations. Thus a 
coefficient of correlation must always be judged with regard to 


(1) the nature of variables with which we are dealing; 
(2) the significance of the coefficient; 

(3) the size and variability of the group (p. 166) ; 

(4) the reliability coefficients of the tests used (p. 342) ; 
(5) the purpose for which the r was computed. 


To consider, first, the matter of the variables being correlated, an r 
of .30 between height and intelligence, or between head measurements 
and mechanical ability would be regarded as important although it 
is rather low, since correlations between physical and mental func- 
tions are usually much lower—often zero. On the other hand, the 
correlation must be .70 or more between measures of general intelli- 
gence and school grades or between achievement in English and in 
history to be considered high, since 7’s in this field usually run from 
40 to .60. Resemblances of parents and offspring with respect to 
physical and mental traits are expressed by r's of .35 to .55; and, 
accordingly, an т of .60 would be high.* By contrast, the reliability 
of a standard intelligence test is ordinarily much higher than .60, and 
the self-correlation of such a test must be .85 to .95 to be regarded 
as high. In the field of vocational testing, the r’s between test bat- 
teries and measures of aptitude represented by various criteria rarely 
rise above .50; and r’s above this figure would be considered ехсер- 
tionally promising. 


* Jones, Н. E., A First Study of Parent-Child Resemblance in Intelligence, 
27th Vis book of the N.S.S.E., 1928, Part I, 61-72. 


174 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


Correlation coefficients must be evaluated also with due regard to 
the reliabilities (p. 332) of the two tests concerned. Because of 
chance errors, an obtained r is always less than its “corrected” value 
(p. 346) and hence, in a sense, is a minimum measure of the relation- 
ship present. The effect upon an r of the size and variability of the 
group is discussed elsewhere (p. 167), and a formula for estimating 
such effect provided. The purpose for which the correlation has been 
computed is important.* The r which is to be employed in predicting 
the scores of individuals from one test to another, for instance, should 
be much higher than the r the purpose of which is to provide fore- 
casts of the achievement of selected groups (p. 344). 

In summary, a correlation coefficient is always to be judged with 
reference to the circumstances under which it was obtained. There 
is no such thing as the correlation between mechanical aptitude and 
abstract intelligence, for instance, but only a correlation between 
certain tests of mechanical aptitude and intelligence given to certain 
groups under definite conditions. Correlation coefficients are always 
to be thought of as relative and never as absolute indices of 
relationship. 


2.. The interpretation of r in terms of drest.) and the coefficient of alien- 
ation 


One of the most practical ways of evaluating the effectiveness of a 
coefficient of correlation is through the standard error of esti- 
mate, Orest). We have found (p. 161) that б y,—which equals 
o, V/1 — r?—enables us to tell how accurately we can estimate (by 
means of the regression equation) an individual's score in Test Y 
when we know his score is Test X. The size of Giest. y) depends 
direetly upon бу and upon the correlation between the two tests. 
When т = 1.00, c(t y) = .00, and we can predict a person's score 
in Y, knowing his score in X, with 100% accuracy—no error. On the 
other hand, when r = .00, c( y, = бу, and we can only be certain 
that the predicted score lies somewhere within the limits of the 
Y-distribution, i.e., within the limits Mean Score + 30y. In other 
words, when r = .00 our estimate of a person’s Y-score is not aided 
at all by a knowledge of his score in X. As r decreases from 1.00 to 
00, the standard error of estimate increases so markedly that pre- 
dictions from the regression equation range all the way from cer- 


*Stead, W. H., and Shartle, С. L. Occupational Counseling Techniques, 
op. cit., Chapters 7 and 8. 
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tainty to what is virtually a "guess." * The significance of an r, with 
respect to predictive value, therefore, may be accurately gauged by 
the extent to which r improves our prediction over a “mere guess." 

Тһе following problem will serve as an illustration: Suppose that 
the correlation between two tests Y and X is .60, and that c, — 5.00. 
Then Gest, y) is 5 X 1 — .60? or 4.00. This SE is 20% less than 5.00, 
the vest. уу When r = .00, i.e., when Gçest. y) has minimum predictive 
value. The amount of reduction in оь, y) as 7 varies from .00 to 
1.00 is given by the expression \/1 — 7?, and hence it is possible from 
МІ — т? alone to gauge the predictive value of an r. The expression 
МТ — т is often called the coefficient of alienation and is denoted by 
the letter k. Тһе coefficient of alienation may be thought of as meas- 
uring the absence of relationship between two variables X and Y in 
the same sense in which r measures the presence of relationship. 
When k = 1.00, r = .00, and when k = .00, r = 1.00: the larger the 
coefficient of alienation the smaller the degree of relationship, and 
the less precise the prediction from X to Y. In order to show how the 
estimate improves as r increases, the k’s for certain values of r from 
.00 to 1.00 are tabulated in Table 24. 


TABLE 24 Coefficients of alienation (k) for values of r from .00 to 1.00 


r k=vi-# r Е= УТ= № 

0000 1.0000 Т 
1000 .9950 (.8660) (5000; 
2000 .9798 .9000 4359 
3000 .9539 9500 3122 
4000 .9165 9800 1990 
5000 .8660 .9900 1411 
6000 .8000 1.0000 

-7000 7141 

(7071) (.7071) 


Note that r must be .866 before k lies halfway between 1.00 and 
.00, before the standard error of estimate is reduced to one-half of 
its value where т = .00. For r's of .80 or less, the coefficients of 
alienation are clearly so large that predictions of individual scores 
based upon the regression equation are little better than “guesses.” | 


*The term “guess” as here used does not imply an estimate which is based 
upon no information whatsoever—a shot in the dark, so to speak. When 
r = (0, the most probable Y-score predicted for every individual in the X-dis- 
tribution is My, and dwst.  — бу. Hence, our Y-estimates are “guesses” іп the 
E that they may lie anywhere in the Y-distribution—but not anywhere 
at alll 

tAn т is more efficient in forecasting the probable success of a group (see 
p. 163). 


176 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


Even when r = .99, the standard error of estimate is still 1/7 as large 
as when r = .00. In contrast to actuarial prediction, therefore, the 
estimation of an individual's score in one test from his score in 
another is not often warranted unless т is at least .90. 

Тһе coefficient E given by the formula below is often useful in 
providing a quick estimate of the predictive efficiency of an obtained 
т. E, which is called the "coefficient of forecasting efficiency" or the 
coefficient of dependability, is derived from k as follows: 


E-1—yl1-r (38) 
Е=1— к 


ог 


("coefficient of forecasting efficiency" or coefficient 
of dependability) * 


To illustrate the application of E, suppose that the correlation of a 
test (or of a test battery) with some criterion of performance is .50. 
From formula (38) E = 1 — 47 ог 13; and the test’s efficiency in 
predicting criterion scores may be put at 13%. When r=.90, 
Е = .56 and the test is 56% efficient; when r = .98, E = .80 and the 
tests is 8096 efficient, and so on. Obviously, the correlation must be 
above .87 for the test’s forecasting efficiency to be greater than 50%. 

E gives essentially the same information as без. y) ОГ k. Thus, if 
r= 50, k = 87 and се, y, is 87% of оу, which is its value when 
r = .00. Accordingly, ап r of .50 reduces the Crest. y) by 13%. 


3. The interpretation of г in terms of the coefficient of determination (r°) 


The interpretation of r in terms of “overlapping” factors in the 
tests being correlated may be generalized through an analysis of the 
variance (o?) of the dependent variable—usually the Y test. In 
studying the variability among individuals upon a given test, the 
variance of the test scores is often a more useful measure of “spread” 
than is the standard deviation. The object in analyzing the variance 
of Test Y is to determine from the correlation between Y and X what 
part of Test Y’s variance is associated with, or dependent upon, the 
variance of Test X, and what part is determined by the variance of 
factors not in Test X. 

When we have computed the correlation between Tests X and Y, 


* Bee Conrad, Н. 8., and Martin, С. B., "The Index of Forecasting Efficiency, 
n ghe Cure of a ‘True’ Criterion,” Journal of Experimental Education, 1935, 
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o?,, Provides a measure of the total variance of the Y-scores; and 
02 cese. у) Which equals c?,(1 — т?) gives a measure of the variance 
lef£ in Test Y when that part of the variance caused by Test X has 
beera ruled out or made constant. Instead of бө, уу the designation 
бу.» А often used to denote that variability in X—insofar as it affects 
Y. is ruled out. What is meant by the term “Х constant" or *ruled 
out?” may be seen in Figure 39 where the variability within any 
col vizxan (“140-149,” for instance) is given by o,\/1 — т. X has a 
constant value for each column (X = 144.5 in column 140-149, for 
exam ple) and accordingly Oy.» becomes a measure of the variability 
of 57 fora constant X. In Figure 39, dy.» is 2.10 in the column 140- 
149 as compared with a "total" о, of 2.62. 

‘The relationship between о, and dy.» may be seen in the following 
ilhastration. If we have the correlation between height and weight 
їп & group of school children, 0°, will be reduced to о? when the 
variance in weight is zero—when all of the children in the group have 
the same weight. If o?,, is subtracted from o?, there remains that 
part ofthe variance of Test Y which is associated with X ; and if this 
is divided by o?, we obtain that fraction of the variance of Y attrib- 
ша. ble to or associated with X. Carrying out these operations, 
we have 


3g 2 42 

Oy — ya _ Fy — 02, Oy y — а 

v= 5 =r, 
67, 67, 


frora which it is clear that т, gives the proportion of the variance of 
Test € which is associated with Test X. When used in this way, 72 
18 Called the coefficient of determination. If the correlation between 
Tests y and X is .707, r? is 50. Hence, an r of .707 means that 50% 
of the variance of Test Y is associated with the variability in Test X. 
Since 7? + К? = 1.00, the proportion of the variance in Test Y which 
18 7202 associated with Test X is given by k?. In the present case, 
mee 72 5 .50, k? is also .50. \ 

h € coefficient of determination tells us what part of the variance 
e Pest y is determined by Test X. But r alone gives us no informa- 
Hoax зз to the character of the association and we cannot assume a 
$m ay relationship unless we have evidence beyond the correlation. 
Шеұзе tion of the squares of small coefficients of correlation empha- 
"med the slight degree of association, in terms of related changes in 
o ability, indicated by low r's. An т of .10, for example, or .20, or 
“> Between Tests X and Y, indicates that only 1%, 4%, and 9%, 
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respectively, of the variance of Y is associated with X. On the other 
hand, when т is .95, about 90% (r? = .90) of the variance of Test Y 
is associated with Test X, only 10% being unrelated. Valuable 
insight into the part played by one or more variables in determining 
the total variance of a criterion may be obtained through the coeffi- 
cient of determination. 


4. Summary 


It may be helpful to summarize the main points brought out in 
this section. 


(1) 


(2) 


(3) 


Whether an obtained r is to be regarded as “high,” “medium,” 
or “low” will depend upon the variables being studied, the re- 
liability coefficients of the two tests, the size of the group and its 
variability, and the purpose for which the т is being computed. 
Correlation coefficients are never absolute indices of relation- 
ship. 


The accuracy with which an r enables us to predict (through 
the regression equation) individual scores in Test Y from given 
scores in Test X may be determined from Gest. y), from E, and 
from k, the coefficient of alienation. 


The coefficient of determination provides a method of determin- 
ing what proportion of the total variance (о?) of Test Y is asso- 
ciated with Test X; and what proportion is independent of Test 
X. This method of analysis may be extended to problems em- 
ploying partial and multiple correlation (p. 396). 


PROBLEMS 


- Write out the regression equations in score form for the correlation 


table in example 3, page 149. 


(a) Compute c. уу and c(t. х): 

(b) What is the most probable height of a boy who weighs 30 pounds? 
45 pounds? What is the most probable weight of a boy who is 36 
inches tall? 40 inches tall? 


In example 4, page 149, find the most probable grade made by a child 
whose score оп Army Alpha is 120. What is the Gest) Of this grade? 


What is the most probable algebra grade of a child whose 1.0. is 100 
(data from example 5, p. 150) ? What is the Gest.) Of this grade? 
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. Given the following data for two tests: 


History (X) English (Y) 

Mx = 75.00 My — 70.00 

6, = 6.00 б, = 8.00 
Toy = 72 


(a) Work out the regression equations in score form. 

(b) Predict the probable grade in English of a student whose history 
mark is 65. Find the c, of this prediction. 

(с) If т, had been .84 (o's and means remaining the same) how much 
would Gest, y) be reduced? 


. The correlation of a test battery with worker efficiency in a large fac- 
tory is .40, and 70% of the workers are regarded as “satisfactory.” 


(a) From seventy-five applicants you select the best twenty-five 
in terms of test score. How many of these should be satisfactory 
workers? 

(b) How many of the best ten should be satisfactory? 

(c) How many in the two groups should be satisfactory if selected at 
random, i.e., without using the test battery? 


. Plot the regression lines in on the correlation diagram given in exam- 
ple 5, page 150. Calculate the means of the Y-arrays (successive Y-col- 
umns), plot as points on the diagram, and join these points with straight 
lines. Plot, also, the means of the X-arrays and join them with straight 
lines. Compare these two "lines-through-means" with the two fitted 
regression lines (see Fig. 39, p. 153). 

. In a group of 115 freshmen, the r between reaction time to light and 

substitution learning is 30. The с of the reaction times is 20 ms. What 

would you estimate the correlation between these two tests to be in а 

group in which the c of the reaction times is 25 ms.? 


. Show the regression effect in example 4, page 149, by caleulating the 
regression equation in standard-score form. For 1.0)/8 1.000 and 
+2.00с from the mean I. Q., find the corresponding school marks in 
Standard-score form. 


. Basing your answer upon your experience and general knowledge of 
psychology, decide whether the correlation between the following pairs 
of variables is most probably (1) positive or negative; (2) high, 
medium, or low. 


(a) Intelligence of husbands and wives. 

(b) Brain weight and intelligence. 

(c) High-school grades in history and physics. 
(d) Age and radicalism. 

(e) Extroversion and college grades. 
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10. How much more will an r of .80 reduce a given dçest,) than an 7 of 40? 
An г of .90 than ап r of 40? 


11. (a) Determine k and E for the following 7/8: .35; —.50; .70; 95. 
Interpret your results. 
(b) What is the “forecasting efficiency” of an r of 45? an r of .99? 


12. The correlation of a criterion with a test battery is 75. What percent 
of the variance of the criterion is associated with variability in the bat- 
tery? What percent is independent of the battery? 


ANSWERS 


1. Y= 40X + 24.12; X = 126Y — 11.52 


(а) б, y) = 1.78; Gest, x) = 3.16 
(b) 36.12 inches; 42.12 inches; 33.84 pounds; 38.88 pounds 

2. 852; б.у) = 70 

3. X = 37Y -- 8.16. When Y(LQ.) is 100, X (algebra) is 45.2 с.л ху 
- 6.8 


4. (а) Y = 9X 2; X = 54Y -+37.2 
(b) 604; Oest. y) = 55 


(c) 22% 
5. (a) 21 7. r= 65 
(b) 9 8. 2-46 and +.92 


(c) 17.5 and 7 (і.е., 70%) 
10. Five times as much; seven times as much. 


11. (a) r k E 
85 94 106 

—.50 Presi 13 

70 71 29 

95 31 69 


(b) 11%; 86% 
12. 56%; 44% 
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THE RELIABILITY OF THE MEAN 
AND OF OTHER STATISTICS 


+ 


I. The Meaning of Reliability 


The true mean or the true о of any set of measurements (of height, 
mechanical aptitude or intelligence, for example) is that hypothe- 
tical value obtained by taking into account the scores made by all of 
the members of some defined group called the population. Since it is 
rarely if ever possible to measure all of the members of a population, 
we must usually be content with “samples”; and owing to slight 
differences in the composition of these samples, computed means and 
6% may be somewhat larger or somewhat smaller than their true 
values, Population measures are called parameters, and are to be 
thought of as fixed reference points. Measurements obtained from 
samples are called statistics. Statistics are always estimates of their 
parameters; and the accuracy of the estimate is a measure of the 
reliability of the statistic. 

While we cannot determine the parameters themselves, we can 
estimate them by computing the amount by which our statistics 
probably diverge from these parameters. This amount, which may 
be large or small, serves as an index of the dependability or trust- 
worthiness of the statistic. Whenever we have calculated a statistic, 
therefore, we should ask ourselves this question: “How accurate an 
estimate is this statistic (mean or SD, say) of the parameter which I 
would get by taking into account the entire population from which 
this sample was drawn?” The purpose of this chapter is to outline 
methods which will enable us to answer this question. The reliability 
of the mean and the median will first be considered; following this 
the reliability of the о and © and of certain other useful statistics. 
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11. The Reliability of the Mean and of the Median 


1. The reliability of the mean 


(1) THE STANDARD ERROR (SE) OF THE MEAN (бу) 


What is meant by the reliability of the mean can best be under- 
stood by examining the factors upon which the stability of this 
measure depends. Suppose that we wish to know the mean ability of 
college freshmen in the United States as shown by their scores upon 
the American Council Psychological Examination. To measure the 
achievement of college freshmen in general would require in strict 
logie that we test all of the freshmen in the United States. But this 
ін obviously a stupendous if not an impossible task, and we must 
perforce be satisfied with taking the records of a sample as large and 
ав randomly drawn as possible. The definition of а random sample 
is given on page 202. Suffice it to say here that we cannot use fresh- 
men from only a single institution or from only one section of the 
country; and that we must guard against selecting only those with 
high, or only those with low, scholastic records. The more successful 
we are in getting an “unselected” group, the more nearly representa- 
tive this group will be of all freshmen in the country. Evidently, 
therefore, the reliability of a mean depends for one thing upon how 
impartially we have chosen our sample. 

Given an adequate sample, the reliability of a mean ean be shown 
to depend mathematically upon two characteristics of the distribu- 
tion: (1) the number of cases and (2) the variability or spread of 
the measures. The formula for the standard error of the mean is 


SE mean ОГ би = x (39) 


(standard error of the arithmetic mean) 


where c — standard deviation of the population and 
N = number of cases in the sample. 


In this formula for the SE of the mean the c in the numerator is 
really the population and not the sample standard deviation. As we 
rarely have the population c we must of necessity use an estimate 
of it (p. 190), and our best estimate is the SD of the sample. Modern 
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writers on statistics often make a distinction between the standard 
deviation of the population and the standard deviation of a sample 
drawn from this population, designating the population SD by o and 
the sample SD by s. While this distinction is helpful, с as a symbol 
for the standard deviation of a sample has been so widely used in 
the psychological literature (o-sealing, c-units, and the like) that 
in this chapter we shall continue to use only с (or SD) and not s. 
We shall, however, designate the standard deviation as population 0 
or sample s when the meaning is not evident from the context. 

Tt is clear that the number of cases influences the mean, since the 
addition of even one extra measure to a series will change the mean 
unless the additional case happens to coincide with the mean exactly. 
Moreover, the addition of 1 score to a set of 10 scores will effect a 
greater change in the obtained mean than the addition of 1 score to a 
set of 1000 scores, as each case counts for less in the larger group. It 
has been shown mathematically that the reliability of a sample mean 
increases, not in proportion to the number of scores upon which it is 
based, but in proportion to the square root of the number of scores. 
The mean obtained from 25 scores is not 25 times, but \/25 or 5 
times, as reliable as a single score. And a mean based upon an N of 
36 is not 4 times as reliable as a mean based upon an N of 9, but only 
2 times as reliable—since \/36 divided by \/9 equals 2. 

The reliability of a mean depends also upon the variability of the 
separate measures around the obtained mean. If the o of the sample 
is large, we are unable to say where the means of other samples 
which we have not drawn will most probably fall—whether they will 
be close to, or far from, the given obtained mean. On the other hand, 
if the o is small, we may be fairly certain that other sample means 
will fall reasonably close to the mean of our sample. The reliability 
of a sample mean, therefore, will vary with the size of the о; ав o 
increases, reliability decreases. 

The SE of the mean is an important and much-used formula. It 
measures the extent to which this statistic is affected by (a) errors of 
measurement as well as by (b) sampling errors—differences occa- 
sioned by fluctuation from sample to sample. A decrease in o or an 
increase in N will cause the standard error to become smaller numer- 
ically. A decrease in oy means that the amount by which the 
obtained mean probably misses the mean of the population is just so 
much less. In short, the reliability of a sample mean increases as 
би decreases. 
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(2) APPLICATION AND USE OF THE SÉ OF THE MEAN 


А problem will serve to illustrate the use and interpretation of the 
SE of the mean, 


Example (1) In 1883, the Anthropometric Committee of the 
British Association found the mean height of 8585 adult males in 
the British Isles to be 67.46 inches, with a SD of 2.57 inches, How 
reliable is this measure of mean height? Specifically, how much 
does it probably diverge from the true mean (parameter) which 
might have been obtained had all adult males in the British Isles 
been measured ? 


We cannot answer this question precisely as the value of the true 
mean is, of course, unknown. But we can give an estimate of re- 
liability in terms of the probable divergence of our mean from the 
TM (true mean). Applying formula (39), we find the SEy to be 


Б 1057 
№8585 
This SE * approximates to the SD of a distribution of means which 


like our mean of 67.46 inches are all derived from samples drawn 
from the common population. The normal curve in Figure 42 repre- 


би = .028 inch 


—0.084 —0.056 -0.028 TM 0.028 0.056 0.084 


FIG. 42 Sampling distribution of means showing variability of obtained 
means around the true or population mean (TM) in terms of ox (.028) 


* Our о of 2.57 inches is our best estimate of the population с (see p. 190) 
and hence is used as our closest approximation to it. 
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sents this distribution of sample means: it is centered at the hypo- 
thetical true mean (ТМ) and its SD (oy) is .028 inch. Sample means 
fall equally often on the plus and minus sides of the TM. About two- 
thirds of such means (actually 68.26%) lie within --1.06у of the 
TM, that is, within a range of + .028 inch. Also, 95 out of 100 
sample means lie within + 2.0 су (more exactly + 1.96 см) of the 
true mean; and accordingly miss the ТМ by not more than + .055 
inch (+ 1.96 X .028). 

Our mean of 67.46 inches is, of course, one of the sample means 
represented in the sampling distribution of Figure 42. Hence the 
probability is high (P —.95) that 67.46 inches (or amy sample 
mean) does not miss the population mean (the parameter) by more 
than + .055 inch. And conversely, the probability is .05 (one chance 
іп 20) that 67.46 inches does miss the TM by more than + ,055 inch. 
Both of these statements are estimates of the reliability of our 
sample mean in terms of its probable divergence from the population 
mean. Deviations from the ТМ which are less likely of occurrence 
than those listed above may be computed by taking into account 
more of the sampling distribution of means in Figure 42. 


Discussion 


How the standard error measures the reliaoility or stability of an obtained 
mean may be more clearly shown perhaps in the following way: Suppose 
that we have calculated the mean height of each of 100 groups of men; that 
each group contains 8585 subjeets; and that the groups or samples are 
drawn at random from the general population. The 100 means obtained 
from these samples will tend to differ slightly from one another owing to 
"errors of sampling," or sampling fluctuations. Hence, not all samples will 
represent with equal fidelity the populátion from which they have been 
drawn. It ean be shown mathematically that the frequency distribution of 
these sample means will fall into a normal distribution around the “true” 
or population mean as their measure of central tendency. Even when the 
samples are themselves skewed, the means from such samples will be nor- 
mally distributed. This “sampling distribution" of means measures the 
errors of sampling or fluctuations in mean values from sample to sample. 
In this hypothetical normal distribution of means we find relatively few 
large plus or minus deviations; and many small plus, small minus, and zero 
deviations. In short, the obtained means will hit very near to the true mean, 
or fairly close to it, more often than they will miss it by large amounts. 

The mean of our distribution of 100 means is our best estimate of the 
“true” or population mean. And our best estimate of the o of this distribu- 
tion of means is the standard error of the mean which we have calculated. 
In other words, су measures the spread of sample means around the true 
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or population mean. It is because of this fact that the standard error of the 
mean becomes a measure of the amount by which any sample mean prob- 
ably diverges from the population mean. 

Тһе results of our hypothetical experiment are represented graphieally 
in Figure 42, page 184. Тһе 100 sample means are represented by a normal 
frequency distribution around the TM (true mean) and oy, is put equal to 
028. Тһе heights of the different ordinates (y's) represent the frequeney 
of the various sample means. The c of a normal distribution when meas- 
ured off in the plus and minus directions from the mean includes the 
middle 68.26% of the cases. About 68 of our 100 obtained means, therefore, 
may be expected to miss the ТМ by not more than +10, (--.098 inch); 
and about 95 of our obtained means may be expected to miss the ТМ by 
not more than +2øx (2-.056 inch). Since our mean of 67.46 inches із one 
of these sample means the probability is approximately .95 that 67.46 ifches 
does not miss the true mean by more than +.056 inch. 


(3) DEFINING RELIABILITY IN TERMS OF LEVELS OF SIGNIFICANCE 


The definition of reliability in terms of the “probable divergence 
of statistic from parameter" is straightforward and reasonable as it 
is evident that confidence can be placed in an obtained mean if there 
is small likelihood of its having missed its true value by a large 
amount. An obvious difficulty with probability statements concern- 
ing reliability, however, arises from our inability to say how far the 
sample mean must miss the Т.М before the expected deviation is to 
be judged "large." Тһе sampling error allowable in a mean will 
always depend upon the purpose of the experiment, the standards of 
accuracy demanded, the units of measurement employed and other 
factors.* An experimenter can never say categorically that a com- 
puted mean is—or is not—reliable, as reliability is a relative, not an 
absolute, concept. But he сап set up accuracy limits which will mark 
off for a given degree of probability the deviation of computed mean 
from TM. Degree of confidence in the stability of a given statistic 
will then depend upon the accuracy limits imposed. 

Two sets of accuracy limits are in general use and have been 
accepted as standard by most investigators. These limits define 
what are called the .05 and .01 levels of significance. How level of 
significance in a mean or other statistic is dependent upon the accu- 
racy limits chosen may be shown in the following way. The sam- 
pling distribution of a mean computed from any fairly large random 
sample will be normal or nearly normal (see Fig. 42). In a normal 
distribution, 95% of the cases (Table А) fall between + 1.96 бу 50 


* Garrett, Н. E., “Mean Differences and Individual Differences,” Human 
Biology, 1943, 15, 155-170. 
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that the odds are 19:1 that any sample mean will lie within these 
limits. Furthermore, 99% of the cases in a normal distribution fall 
between + 2.58 бу and the odds аге 99:1 that any sample mean will 
lie within these limits. Conversely, 5% of the means сап be expected 
to lie outside the limits + 1.96 oy and 1% outside of the limits 
+ 2.58 oy. 

These two intervals (+ 1.96 oy and + 2.58 oy) constitute, then, 
ranges or aceuracy limits within which, for a known probability, our 
sample mean will fall. Our faith in these limits is expressed by say- 
ing that we may be “confident at the .05 level" that our M lies in the 
range TM + 1.96 оу; and “confident at the .01 level" that our mean 
lies in the range TM + 2,58 oy. We can expect to be wrong 5% of 
the time if we take the .05 level and 1% of the time if we take the 
01 level. These levels .05 and .01 reflect degrees of assurance, there- 
fore, the .01 level deserving greater respect than the .05. 

We may illustrate the concept of significance levels by reference to 
Example (1), p. 184. Taking the + 2.58 оу accuracy limits, we may 
be confident at the .01 level that 67.46 inches does not deviate from 
the TM by more than + .07 inch (+ 2.58 X .028). The expectation 
of an error of + .07 inch or more in our sample mean is expressed 
by a probability of .01. It is extremely doubtful whether our measur- 
ing instrument for height could detect an error of the order + .07 
inch. Therefore, an experimenter would be clearly justified in taking 
the sample mean of 67.46 inches (with an SE of .028 inch) as highly 
stable and deserving of great confidence. 


(4) ESTABLISHING CONFIDENCE-INTERVALS FOR THE T'M 


So far we have discussed reliability in terms of the probable 
divergence of sample mean from TM. Another approach to the 
problem of how best to describe the reliability of a statistic is 


through the setting up of limits which for a given level of signifieance 
will embrace the ТМ. Such limits are said to define confidence- 


intervals. Тһе method of establishing such intervals is as follows. 
It is clear from Figure 43 that in our sampling distribution of mean 
heights, ТМ + 30x provides reasonable limits within which nearly 
all (actually 99.97%) of our sample means can be expected to lie. 
Since the Т.М itself is unknown, all that we can infer with respect to 
this parameter is that it could take a range of values—one of which 
is the given sample mean. Suppose we take 30, as a fairly inclu- 
sive working range (Fig. 43). Then if our M falls at the tentative 
upper limit of the sampling distribution, TM = M — 3 oy; while if 
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тӛс. +30 
М ТМ М 
ТМ-М%3о, TM-M-30y 


FIG. 43 When M falls at --3oy, TM = M — 3oy; 
when M falls at —3o,, TM = M + 3с, 


M falls at the tentative lower limit of the sampling distribution, 
TM =M-+3oy. These relations are shown graphically in Figure 
43. Since + 3 ox in a normal distribution include 99.97% of the 
cases, the limits specified by M + 8 oy are said to define the 99.97% 
confidence-interval. Evidently, we may feel confident to a degree 
approaching certainty that the TM lies within this range. 

Intervals which portray other degrees of confidence can be set up 
in the same way. We know that 95% of the cases in a normal dis- 
tribution fall within the limits + 1,96 см and that 99% fall within | 
the limits + 2.58 o, (Table А). If we take the limits specified by 
M + 1.96 oy, we define the 9576 confidence-interval for the TM. 
Basing our judgment on these limits, in à long series of experiments 
we stand to be right 95% of the time and wrong 5%. For still greater 
assurance, we may take the limits M + 2.58 би, which define the 
99% confidence-interval for the TM. 

Let us apply the concept of confidence-intervals to the problem of 
heights on page 184. Taking as our limits M + 1.96 Oy, we have 
67.46 + 1.96 X .028 or a confidence-interval limited by the points 
67.41 and 67.51. If we say that this interval contains the ТМ the 
probability of our being right is -95, of our being wrong .05. If we 
desire a higher degree of assurance, we can take the 99% confidence- 
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interval. Here the limiting points are 67.39 and 67.53 (i.e., 67.46 -- 
2.58 X .028). Our faith that these limits contain the ТМ is expressed 
by a probability of .99. 

It may seem to many students that use of the confidence-interval 
is an exceedingly roundabout way of making an inference concerning 
the population mean; that it would be much more straightforward to 
say “the chances are 95 in 100 that the ТМ lies between 67.41 and 
67.51." Such probability statements concerning the value of the 
ТМ are often made and lead to what appears to be virtually the 
same result as that given above in terms of confidence-intervals. 
Theoretically, however, such inferences regarding the ТМ are defi- 
nitely incorrect, as the Т.М is not a variable which can take several 
values but is a fixed point. The TM has only one value and the 
probability that it equals some given figure is always either 100% 
or zero—right or wrong. Our probability figures (e.g., .95 or .99) 
do not relate to our confidence that the ТМ itself could take one of 
several values within the given range. Rather, the probability used 
in specifying confidence-intervals is an expression of our confidence 
in the inference, namely, of our confidence that the given interval 
includes the ТМ. This is a subtle point, but a valid one. , 

Тһе limits of the confidence-interval of a parameter (ТМ) have 
been called by Fisher* fiduciary limits, and the confidence to be 
placed in the fiduciary limits as containing the given parameter is 
called fiduciary probability. In terms of fiduciary probability, the 
reliability of an obtained mean could be stated as follows: “The 
fiduciary probability is .95 that the true mean lies in the interval 
M + 1.96 oy, .05 that it lies outside these limits.” 


е 


(5) THE SE оғ THE MEAN IN SMALL SAMPLES 


It сап be shown mathematically that the SD of a sample sys- 
tematically underestimates (is smaller than) the population c, 
although this underestimation is not severe unless the samples are 
quite small. To correct this tendency toward negative bias, we 
must eompute the standard deviation of a small sample by the 


formula s — m rather than by the usual formula, с = үш 


(р. 51). m 


* Fisher, R. A, The Design of Experiments (London: Oliver and Boyd, 


1935), pp. 200 f. 3 P 
T Holtzman, W. H., “Тһе Unbiased Estimate of the Population Variance and 


Standard Deviation," Amer. Jour. Psychol., 1950, 63, 615-617, 
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When N is less than 50 or so (some statistieians say 30) the 
formula for the SE of the mean should read 


8 


oy = JN (40) * 


(standard error of the mean in small samples) 


тї 
=з) 

Formula (40) always provides the best estimate of the SE of the 
mean, i.e., of the SD of the sampling distributions of means (Fig. 42; 
р. 184), no matter what the size of №. In very large samples, how- 
ever, the correction effected by using (40) is so slight as to be 
negligible and formula (39) may be safely used. When N is less than 
50 it is advisable to use the more exact formula, and it is imperative 
that we do so when N is quite small—less than 10, say. 

When we are dealing with small samples, the normal distribution 


where s `= 


and N = number of cases in the sample. 


df=% 
-------- df= 25 
97099 
=———— dii 


-4 -3 72 22. 
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Scale of ¢ . 

FIG. 44 Distribution of t for degrees of freedom from | to ор. When df- 
is very large, the distribution of t is virtually normal 

[After Lewis, D., Quantitative Methods in Psychology (Iowa City, 

1948), p. 188 


Sy 
* If the SD of the sample has been computed by the formula o AES we 


can make the same correction in oy as that given in formula (40) by using the 
с 


formula ом = у=: 
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no longer tells us accurately the amount by which a statistic prob- 
ably diverges from its parameter. The sampling distribution to be 
used when N is small is not as tall as the normal curve and the 
“tails” or ends are somewhat higher. Figure 44 shows graphically 
how this distribution—called the t-distribution or “Student’s” * dis- 
tribution—compares with the normal. The student should note that 
the t-distribution does not differ greatly from the normal unless N 
is quite small; and that as N increases in size the t-distribution ap- 
proaches more and more closely to the normal form. In the case of 
the sampling distribution of the mean, t = Ns LMY, or + йы" 
SEx би 
that is, £ is essentially а o-score (p. 305). 

Selected points in the ¢-distribution are given in Table D. For N's 
increasing in size, this table gives + t distances beyond which (i.e., 
to the left and right) certain percentages of the sampling distribu- 
tion lie. These percents are .10, .05, .02, and .01. An illustration will 
make clear the use of Table D in small samples and will introduce 


the new concept “degrees of freedom” (see p. 193). 


Example (2) Ten measures of reaction time to a light stimulus 
are taken from one practiced Observer. The mean is 175.50 ms 
and the s is 5.82 ms. How reliable is this mean? 


From formula (40) we find that ом = 250 ог 1.84 ms. By definition, 


\//10 TM 
pa MTM dla present case, t = 2000 HTM We doti, 
SEy 1.84 


know, of course, the value of the 7M in the t-equation; but if we 
know the proper number of degrees of freedom we can determine the 
value of £ at selected points in the t-distribution. The df (degrees of 
freedom) available for evaluating the given t are (№ — 1) or 9. 
Entering Table D with 9 df we read that t = 2.26 at the .05 point and 
3.25 at the .01. From the first t we know that 95% of sample 
means like 175.50 ms, the mean we have, lie between the TM and 
--2260, and that 5% fall outside these limits. From the second t 
we know that 99% of our sample means lie between the population 
mean and +3.250, and that 1% falls outside these limits. We may 
be confident at the .05 level, therefore, that our mean of 175.50 ms 
does not differ from its parameter (ТМ) by more than +4.16 ms 


жаз tj hi donym of W. S. Gosset who developed the t-distribu- 
tion, E Walker, Halen М. Blementary Statistical Method (New York: Henry 
Holt and Co., 1943), p. 159. 
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(2.26 X 1.84); and we may be confident at the .01 level that our 
mean does not miss the TM by more than +5.98 ms (+3.25 X 1.84). 

Confidence-intervals may also be established for the ТМ in 
this problem by the methods of page 187. Taking as our limits 
M + 2266y, we have 175.50 + 4.16 or 171.34-179.66 as indicating 
the limits of our .95 confidence-interval. Or taking M +3.250y as 
broader limits, we have 175.50 -- 5.98 or 169.52-181.48 as marking 
off our .99 confidence-interval. If we infer that the population mean 
lies within the latter interval, in a long series of experiments we 
stand to be right 99% and wrong 1% of the time. The width of the 
99 confidence-interval (11.96) shows clearly the high unreliability 
likely to exist in a mean when our estimate is based upon a very 
small sample. 

Several points in the solution of this problem deserve further com- 
ment as they illustrate clearly the difference between confidence 
levels in large and small samples. Had we used formula (39) in Ex- 
ample (2) instead of the correct formula (40), the SE of our mean 
would have been 1.75 ms instead of 1.84 ms, 5% too small. Again, 
the .05 and .01 significance levels in the normal curve are +1.96 and 
2.58 (p. 187). These limits are 15% and 20% smaller than the 
correct t-limits of +2.26 and +3.25 read from Table D for 9 df. It 
is clear, therefore, that when N is quite small, use of formula (39) 
will cause a calculated mean to appear more accurate than it actu- 
ally is. 

2.57 


The SE of the mean in the height problem on page 184 was 


or .028 inch. The student should note that had formula (40) and 
Table D been used in determiriing the reliability of the obtained 
mean of 67.46 inches, results would not have differed to the third 
decimal from those got with formula (39) and Table A. This is true, 
of course, because the N of 8585 is very large. As N increases (see Fig. 
44), t-entries in Table D approach more and more closely the corre- 
sponding normal curve deviates in Table A. In the normal curve, for 
instance (see Table A), 10% of the distribution lie beyond the limits 
1.65, 5% beyond the limits £1.96, and 1% beyond the limits 
+2.58. In Table D the corresponding ¢-limits for (N — 1) = 50, are 
+£1.68, +2.01, +2.68. For (N — 1) = 100, these limits are +1.66, 
251.98, +2.63. When N is very large (see last entry in Table D) 
the t-distribution becomes a normal curve. It is only when N is less 
than about 50, say, that the /-distribution diverges markedly from 
the normal. As research workers in the social sciences rarely use 
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groups smaller than 50, small-sample statisties are not as generally 
useful in psychology and education as they are in biology and 
agriculture.* 


(6) DEGREES OF FREEDOM 


The concept of “degrees of freedom" which we have encountered 
on pages 191-192 is highly important in small-sample statistics. It 
is crucial, too, in analysis of variance and will appear increasingly 
often in later chapters. The degrees of freedom (df) available for 
evaluating a statistic depend upon the number of restrictions placed 
upon the observations—one df being lost for each restriction imposed. 
Where one restriction comes from can best be shown by a simple 
example. If we have five scores, 5, 6, 7, 8, and 9, the M is 7; and the 
deviations of these scores around 7 are —2, —1, 0, 1, and 2. The sum 
of these deviations is лего. While there are 5 deviations, only 4 of 
these [ (N — 1) ] can be freely selected as the condition that the sum 
of the deviations equals zero immediately fixes the fifth deviation. 
When there are N independent scores, there are N degress of free- 
dom for computing the M, but only (N — 1) df available for the SD 
since this statistie is computed from deviations taken around the M. 
In Example (2), page 191, the df available for determining the relia- 
bility of the M were given as (N — 1) or 9: one less than the num- 
ber of observations (i.e., 10). Since one df was lost in computing the 
M only (N — 1) are left for estimating the reliability of the M by 
way of the SD and the t-ratio. 

Our best estimate of the true (or population) o (see p. 190) is 
obtained by using (N — 1) instead of N in the formula for the c; that 
is, by taking due account of the restriction imposed through caleula- 
tion of the M. It is quite important that we take df into account 
when N is small; unimportant practically that we do so when N is 
large (p. 192). In general the number of degrees of freedom available 
at any given time equals N minus the number of parameters already 
estimated from the N observations (each parameter adds a restric- 
tion). M is the only parameter estimated before computing the SD, 
and accordingly the df available in Example (2) were (N — 1). The 
number of df is not always (N — 1), however, but will vary with the 
statistic. In determining confidence levels for r, for example, the 
available df are (N — 2). Two df are lost, one restriction being im- 


* Snedecor, George W., Statistical Methods (4th ed.; Ames, Iowa: Iowa State 


College Press, 1946), Chaps. 3 and 8. AMA 
t E(QX — M) or Ez (calculation algebraic) is always zero. 
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posed for the M of Y (the dependent variable) and another for the 
regression coefficient b, which describes the relation between Y and 
X.* Rules for determining the df available in the Chi-square test 
and in analysis of variance tables are given in appropriate places in 
later chapters. 


2. The reliability of the median 


The standard error of the median is roughly 5/4 times oy. In terms 
of c and Q, the SE's of the median are 


1.253 


Oan = VN (41) 
биа = are (42) 


(standard error of the median in terms of a and of Q) 


Ап example will illustrate the use of formula (42). 
Example (3) On the Trabue Language Scale A, 801 twelve-year- 
old boys made the following record: Median — 2140; Q — 49. 
How reliable is this median? How well does it represent the 
median of twelve-year-old boys in general on the given scale? 
By formula (42) the бил 1858 49 or .32. Since N is quite 


м 801 — 

large, accuracy limits may be taken at +1.96 and +2.58 (last line 
of Table D). We may be confident at the .05 level that the 
median of 21.40 does not miss the population median by more than 
£1.96 X .32 or +.63; and confident at the 01 level that 21.40 does 
not differ from the true median by more than +2.58 X .32 or by 
26.88. The .99 confidence-interval for the true median is 21.40 -- .83 
or from 20.57 to 22.23. This very narrow range (for which the 
Р = 99) indicates high stability in the computed median. 


lll. The Reliability of Measures of Variability 


1. The reliability of the standard deviation 


The reliability of the SD, like the reliability of the mean and 
median, is determined by calculating the probable discrepancy be- 
* See page 154, 
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iweexx the obtained SD and its parameter (true or population SD). 
The formula for the SE of the ois 


SE,oro,— —— (43) 


(standard error of a standard deviation) 


The sampling distribution of 6 is skewed for small samples (N less 
than 25, say). When samples are large, however (greater than 100), 
and h а, уе been drawn at random from a normal population, formula 
(43) a ау be applied and interpreted in the same way as SEx. То 
illustra te, we found on page 184 that for 8585 British males, the SD 
arouxicl the M of 67.46 inches was 2.57 inches. By formula (43) 
с, = 2 E or .02 inch. Since N is large, the .99 confidence-inter- 
valfor Ae true or population SD can be taken as о + 2.580,. Sub- 
stitutin g for о and о, we have 2.57 + 2.58 X .02 or 2.52—2.62 as our 
99 COn fidence-interval. If we proceed on the assumption that the 
true SZD lies within this range we will—in a long series of experi- 
ments. be right 99% and wrong 1% of the time., 

It is mot often that we are called upon to compute the SE of с in 
small Samples. This is fortunate, as there is no very efficient way of 
estimating the reliability of с when the sample is small. 


2. The reliability of the quartile deviation or Q 


The reliability of 0 may be found from the formulas 


.786с 
Ug (44) 
and d 
_ 1170 ЧЕ 
%- RUN (45) 


(standard error of Q in terms of o and of Q) 


These formulas are applied and interpreted as are the other SE 
imula, On page 194, for example, the median of the 801 twelve- 
D old boys who took the Trabue Completion Test was 21.40 with 


абоғ 4 о The SE of this Q by (45) is 17249 or 20. The 95 con- 


P eue nudi for the true or population Q may be taken as 4.5 to 
ES 1-€e.. 49 + 1,96 x 20. The narrow range of the .95 confidence- 
3 Tal indicates high stability. 
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IV. The Reliability of Percentages 
and Correlation Coefficients 


This section will consider the computation and use of the SE's of 
а percentage and a correlation coefficient, For the SE's of other sta- 
tisties the student should go to the more advanced references, The 
Handbook of Statistical Nomographs, Tables and Formulas, by 
Dunlap and Kurtz (World Book Co., 1932), contains many formulas 
helpful in research. 


1. The reliability of a percentage 


It is often feasible to find the percentage of a given group which 
exhibits certain behaviors or possesses certain definite attitudes or 
other characteristies when it is difficult or impossible to measure 
these attributes directly. Given the percentage occurrence of a 
behavior the question often arises of how much confidence we can 
place in our figure. How reliable an index is it of the incidence of the 
behavior in which we are interested? To answer this question, we 
must go to the SE of the percentage given by the formula 


=[Р0 6 
Gy, xs (46) 


(SE of a percentage) 


in which P — the percentage oceurrence of the behavior, Q — 1 — P, 
and N is the number of cases, 


We may illustrate formula (46) with the following problem. 


Example (1) Ina study of cheating among elementary school 
children, 144 or 41.4% of 348 children from homes of good socio- 
economie status were found to have cheated on the various tests. 
Assuming our sample to be representative of children from “good” 
social levels, how much confidence can be put in this percentage? 
How. well does it represent the population or true percentage? 


Applying formula (46) we get that (гет е = 


= 27%. Тһе sampling distribution of percents can be taken as nor- 
mal when N is large (larger than 50, say) and when P is not too close 
to 0% or 100%. Тһе SE is interpreted like oy. Thus in the present 
problem the .99 confidence-interval for the population percentage is 
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414 ZƏ --2.58 X 2.7 or from 344% to 48.4%. We may feel sure, 
there f ore, that the percentage of children who will cheat in samples 
ofthis sort is at least as large as 34.4 and not larger than 48.4. The 
SE of ға percentage finds its chief use іп problems in which the signifi- 
cance Of the difference between two percentages is to be determined 
(p. 236). 


2. Thre reliability of the coefficient of correlation (r) 


(1) нЕ orr 
The classical * formula for the SE of r is 


е Іт) ' (47) 


VN 


(SE of a coefficient of correlation r when N is large) 


Inthe height-weight problem on page 129, r = .60 and N = 120. The 


эь 2 
SEof 2- by formula (47), therefore, is NS or .06 (to two deci- 


mals) . То test the reliability of r in terms of its SE, we assume the 
ватр»13 тър distribution of r to be normal, place the "true r" at the 
center- (Fig. 45) of the distribution, and take .06 (i.e., SE,) to be the 


72.580. 2580, 
True 
71960, r 1.96 0, 
FIG. 0;-0.06 
ilis SESS There are 95 chances іп 100 that the obtained r does not miss 


true, by more than —-.12(2-1.966,). The .99 confidence-interval 
For the true г is r+ 2.586, or .60 + .15, i.e., .45 to .75 


J Y EM 
don: Gg ©. Udny, An Introduction to the Theory of Statistics (10th ed.; Lon- 
Arles Griffin and Co., 1932), р. 352. 
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SD of this sampling distribution of 775. Since the probability is .05 
of an error exceeding +1.966,, there is only one chance in 20 that an 
error of +.12 or more exists in our. Again, the .99 confidence inter- 
val for the true r can be taken as r + 2.580,. Substituting for r and 
SE, we get .45 and .75 as the limits of our .99 confidence-interval. 1% 
would seem reasonably certain, therefore, that r is at least as large 
as .45. 

There are two serious objections to the use of formula (47). In 
the first place, the r in the formula is really the true or population r. 
Since we do not have the true r, we must substitute the calculated or 


except when the population т is 00 and N is large. When r is high 
(.80 or more, say) and N is small, the sampling distribution of r is 
skewed and the SE from (47) is decidedly misleading. Skewness in 


ability of an r less than .80 in a new sample of 20 cases is much 


mately normal and (2) its ДЕ depends only upon the size of the 
sample N, and is independent of the size of r. The formula for о; is 
6. = —— (48) 
vVN-3 
(SE of Fisher's function z) 
Suppose that т = .85, and N = 52. Then from Table C we read 


* Fisher, В. A., Statistical Methods for Research Workers (8th ed.; London: 
Oliver and Boyd, 1941), рр. 190-203. 
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that am r of .85 corresponds to a z of 1.26. SE, from (48) is І 


ог.14-. Тһе .95 confidence-interval for the true 2 is now .99 to 1.53 
(Le, 1.26 + 1.96% .14 or 126+ -27). Converting these z's back 
into 2-7 "we get a confidence-interval of from -76 to 91. The fiduciary 
proba. bi lity is .95 that this interval contains the true r (p. 189). 
The coefficient of correlation .60 in the height-weight problem 
above is not large enough for the conversion into z to make much 
differen ce in our reliability estimates. An r of .60 is equivalent to a z 


of 69 (Table C), and the SE, is — 1 
20-9 


The 99S confidence-interval for the true z, therefore, is .46 to .92 (i.e., 
69+ 22.58 x (00 or .69 + 28). When we convert these z's back into 
тее .99 confidence-interval for the true r becomes .43 to .73. This 
Tange is almost identical with that on page 198 obtained when we 
used алай, 


or .09 (to two decimals). 


(2) TESTING r AGAINST THE NULL HYPOTHESIS 
The reliability of an obtained r may be tested also against the 


--018 т-0.00 r=0.18 
‘When the population r is zero, and df = 118, 5%, of the sample 
r's exceed +.18, and 1% exceeds +.24 


FIG. 46 


hyp othesis that the population r is in fact zero.* If the computed r 
ү large “Lough to invalidate or cast serious doubt upon this null 
Ypothess я we accept r as indicating the presence of at least some 


* 
58 Page 213 for definition of null hypothesis, 
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degree of correlation. To make the test, enter Table 25 with 
(N — 2) degrees of freedom * and compare the obtained r with the 
tabulated entries. Two significance levels, .05 and .01, are given in 
Table 25, which is read as follows when, for example, r — .60 and 
N = 120. For 118 df the entries at .05 and .01 are by linear interpola- 
tion .18 and .24, respectively (to two decimals). This means that 
only 5 times in 100 trials would an r as large as +.18 arise from 
fluctuations of sampling, if the population r were actually .00; and 
only once in 100 trials would an r of 3-.24 appear if the population r 
were in fact .00 (Fig. 46). It is clear that the obtained r of .60, since 
it is much larger than .24, is significant at the .01 level. 


TABLE 25 + Correlation coefficients at the 5% and 1% levels of signifi- 
cance 


Example: When N is 52 and df is 50, ап r must be 273 to be significant at 
.05 level, and 354 to be significant at .01 level. 


Degrees of Degrees of 
freedom 05 01 freedom 05 01 
(N — 2) (N — 2) 
1 997 1.000 24 388 496 
2 950 990 25 381 487 
3 878 959 26 314 478 
4 811 917 27 367 А70 
5 754 874 28 361 463 
6 707 834 29 355 456 
7 666. 798 30 349 449 
8 632 1765 35 325 418 
9 602 435 40 304 393 
10 576 1708 45 288 372 
1 553 684 50 273 354 
12 532 661 60 250 325 
18 514 641 70 232 302 
14 497 623 80 217 283 
15 482 606 90 205 267 
16 468 590 100 195 254 
17 456 575 125 174 228 
18 444 561 150 159 208 
19 433 549 200 138 181 
20 423 537 300 113 148 
21 A13 526 400 098 128 
22 404 515 500 088 115 
28 396 505 1000 062 081 


Table 25 takes account of both ends of the sampling distribution— 
does not consider the sign of r. When N — 120, the probability (P/2) 
of an r of .18 or more arising on the null hypothesis is .025; and the 

* Page 193. 


1 This table is abstracted from the column for 2 variables in Table J, page 
437. 
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probability of an r of —.18 or less is, of course, .025 also. For a P/2 
of .01 (or P of .02) the r by linear interpolation between .05 (.18) 
and .01 (.24) is .21. On the hypothesis of a population r of zero, 
therefore, only once in 100 trials would a positive r of .21 or larger 
arise through accidents of sampling. 

Тһе .05 and .01 levels in Table 25 are the only ones needed ordi- 
narily in evaluating the significance of an obtained т. Several illus- 
trations of the use of Table 25 in determining significance are given 
below: 


Size л Пестр Caleulated Interpretation 
(N —2) y 
10 8 70 significant at .05, 
not at .01 level 
152 150 —12 not significant 
27 25 50 significant at .05, 
barely at .01 level 
500 498 20 very significant 
100 98 — 30 very significant 


It is clear from these examples that even a small r may be signifi- 
cant if computed from a very large sample, and that an т as high as 
170 may not be significant if N is quite small. Table 25 is especially 
useful when N is small. Suppose that we have found an r of .55 from 
а sample of 12 cases. Entering Table 25 with (N — 2) or 10 df we 
find that r must be .71 to be significant at the .01 level and .58 to be 
significant at the .05 level. In this small sample, therefore, even an r 
as high as .55 cannot be taken as indicative of any real correlation. 


V. Sampling and the Use of Reliability Formulas 


АП of the reliability formulas given in this chapter depend upon 
N, the size of the sample, and most of them require some measure of 
variability (usually c). It is unfortunate, perhaps, that there is 
nothing in the statement of an SE formula which might deter the 
uncritical worker from applying it to the statistics calculated from 
any set of test scores. But the general and indiscriminate computa- 
tion of SE's will inevitably lead to erroneous conclusions and false 
interpretations. Hence, it is highly important that the research 
worker in experimental psychology and in educational research have 
clearly in mind (1) the conditions under which reliability formulas 
are (and are not) applicable; and that he know (2) what his relia- 
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bility formulas may be reasonably expected to do. Some of the limi- 
tations to reliability formulas have been given in this chapter. These 
statements will now be amplified and further cautions to be observed 
in the use of SE's will be indicated. 


1. Methods of sampling 


Various techniques have been devised for obtaining a sample 
which will be representative of its population. The adequacy of a 
sample (i.e., its lack of bias) will depend upon our knowledge of the 
population or supply * as well as upon the method used in drawing 
the sample. Commonly used sampling methods will be described in 
this section under four headings: random, stratified or quota, іпсі- 
dental, and purposive. 


(1) RANDOM SAMPLING 


The descriptive term “random” is often misunderstood. It does not 
imply that the sample has been chosen in an offhand, careless or hap- 
hazard fashion. Instead it means that we rely upon a certain method 
of selection (called “random”) to provide an unbiased cross section 
of the larger group or population. The criteria for randomness in a 
sample are met when (1) every individual (or animal or thing) in 
the population or supply has the same chance of being chosen for 
the sample; and (2) when the selection of one individual or thing 
in no way influences the choice of another. Randomness in a sam- 
ple is assured when we draw similar and well shaken-up slips out of 
a hat; or numbers in a lottery (provided it is honest); or a hand 
from a carefully shuffled deck of cards. In each of these cases selec- 
tion is made in terms of some mechanical process and is not subject 
to the whims or biases (if any) of the experimenter. 

A clear distinction should be made between representative and 
random samples. A representative sample is one in which the dis- 
tribution of scores in the sample closely parallels that of the popula- 
tion. Experience has shown that if one is asked to get representative 
samples from a population he will for various reasons (some not 
recognized) often draw samples which exhibit consistent biases of 
one sort or another. The most trustworthy way of securing represen- 
tativeness, therefore, is to make sure that the sampling is random. If 
we draw samples at random from the population we know at least 

* A supply usually means a population of objects or things. 
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that (a) there will be no consistent biases; (b) on the average these 
samples will be representative; (c) the degree of discrepancy likely 
{о occur in any given sample сап be determined by probability 
methods. The SE formulas given in this chapter apply only to ran- 
dom samples. 

In research problems in psychology and in education three situa- 
tions arise in connection with the drawing of a random sample: (a) 
the members of the population or supply are on file or have been 
catalogued in some way; (b) the form of the distribution of the 
trait in the population is known to be (or can reasonably be assumed 
to be) normal; (c) the population is known only in general terms. 
"These situations will be discussed in order. 

(a) Members of population are on file or are catalogued. If the 
population has been accurately listed, a type of systematic selection 
will provide what is approximately a random sample. Thus we may 
take every fifth or tenth name (depending upon the size of the sam- 
ple wanted) in a long list, provided names have been put in alpha- 
betical order and are not arranged with respect to some differential 
factor, such as age, income or education. (A better plan in such cases 
is to assign numbers to the members of the population and draw a 
sample as described below.) By this method an approximately 
random sample of telephone users may be obtained by reference to 
the telephone directory; of sixth grade children from attendance 
rolls; of automobile owners from the licensing bureau; of workers in 
a factory from payroll lists, Random samples of the population with 
respect to a variety of characteristics may be drawn in the same way 
from census data. 

Systematic selection from a catalogued population is often used in 
determining the acceptance rate of industrial products. Thus in 
sampling machine-produced articles for defectives, a random sample 
may be obtained by taking every tenth article, say, as it comes from 
the machine. Sampling of this sort is justified if the manufactured 
articles are taken just as they come from the machine, so that sys- 
tematic selection provides an approximately random sample from the 
supply. 

When the subjects in a group are to be assigned at random to one 
or more experimental and control sub-groups, tables of random num- 
bers may be used to good purpose.* In such tables, numbers arranged 


‚ * Fisher, R, A., and Yates, F. Statistical Tables (New York: Hafner Publish- 
ing Co., 1948), Table 33. 


l 
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by a chance procedure are printed in sequence. The tenth block of 
25 numbers taken from Fisher’s and Yates’ Table and reproduced 
below will serve as an example. 


34 50 57 74 37 
85 22 04 39 43 
09 79 13 77 48 
88 75 80 18 14 
90 96 23 70 00 


The Fisher-Yates table is made up of 300 similar blocks of 25 num- 
bers, printed on 6 pages of 10 rows and 5 columns each. To read 
from the table one may begin at any point on any page and read in 
any direction, up or down, right or left. When all of the individuals 
in the entire group or population have been numbered in 1, 2, 3 order, 
а random sample of any size can be drawn by following in order the 
numbers read from the table. Suppose, for example, that a random 
sample of 25 is to be drawn from a larger “population” of 100. Then 
if we have decided beforehand to start with the second column in the 
block above and read down, individuals numbered 50, 22, 79, 75, 
and 96 will be included. Other blocks chosen in advance may be used 
to provide the additional 20 subjects. If the same number occurs 
twice, the second draw is disregarded. 

(b) Distribution of trait in population known. As result of much 
research in individual differences many physical and mental traits 
are believed to be normally distributed (at least approximately) in 
the population. If we are justified in assuming that the trait or 
ability in which we are interested is normally distributed in the 
general population, a sample drawn at random from this population 
will itself tend toward normality, so that symmetry of distribution 
becomes one criterion of sample adequacy. 

(c) Population known only in general terms. In many problems 
in psychology and in education the population is (1) not clearly de- 
fined, (2) not readily accessible for sampling (for example, the 
population of a state), and (3) very expensive to sample extensively. 
Under conditions such as these a useful test of the adequacy of a 
sample consists in drawing several samples at random and in succes- 
sion from the population, such samples to be of approximately the 
same size as the sample with which we are working. Random sam- 
ples of ten-year-old school boys in a large school system, for instance, 
must be drawn without bias as to able, mediocre, or poor individuals; 
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they cannot be drawn exclusively from poor neighborhoods, from 
expensive private schools, or from any larger group in which special 
factors are likely to make for systematic differences. 

When the means and o's of these presumably random samples are 
closely alike we may feel reasonably sure that our samples are repre- 
sentative of their population. If the correspondence among samples 
is not close we must re-examine each sample for bias. This test 
has been criticized on the grounds that (1) the correspondence of two 
or more samples may reflect nothing more than a common bias, and 
(2) consistency is not a sufficient criterion of randomness. While 
this is true and the test is admittedly rough, it may be argued that 
a reasonable consisteney among samples is a necessary first condi- 
tion of randomness. If samples are fairly consistent, therefore, 
they are presumably random unless subsequent examination reveals 
a common bias. If samples differ widely, we cannot be sure that any 
is random. 


(2) STRATIFIED OR QUOTA SAMPLING 


Stratified or quota sampling (also called "controlled" sampling) 
is a technique designed to insure representativeness and avoid bias 
by use of a modified random sampling method. This scheme is 
applicable when the population is composed of sub-groups or strata 
of different sizes, so that a representative sample must contain 
individuals drawn from each category or stratum in accordance 
with the sizes of the sub-groups. Within each stratum or sub-group 
the sampling is random—or as nearly so as possible. Stratified 
sampling is illustrated in the standardization of the 1937 Stan- 
ford-Binet Scale in the course of which approximately 3000 chil- 
dren were tested. То insure an adequate selection of American 
youth, the occupational levels of the parents of the children in 
the standard group were checked against the six occupational levels 
of employed males in the general population as shown by the US. 
Census of 1930. Differing proportions of men were found in the 
groups classified as professionals, semi-professionals, businessmen, 
farmers, skilled laborer, slightly skilled and unskilled laborers. Only 
4% of employed males were found in the professional group, while 
31% were in the skilled labor group. Accordingly, only 4% of the 
children in the Stanford-Binet standardization group could have 
fathers in the professional category, while 31% could have fathers in 
the skilled labor group. In publie opinion polling, the investigator 
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must see that his sample takes account of various strata or criteria 
such as age, sex, politieal affiliation, urban and rural residence, 
ete. 

When sampling is stratified, the SE formula for the mean differs 
slightly from the SE, formula when sampling is strictly random. 


Тһе new formula is 
2 — g42 
былқ ie m Si (49) 


(SE of M when sampling has been stratified) 


in which с = SD of the entire sample 
с, = SD of the means of the various strata around the mean 
of the entire sample. 


А convenient formula for o, is 
o, =, ПО — MY! E N Uf, — М)#-Е — EN М» — MY] (o) 
A N 


(standard deviation of the means of strata around the mean 
of the entire group) 


in which Ny, Ха... №, = number of cases in strata 1 to k; and 
N and M are the size and mean of the whole sample. 

To illustrate formula (49), suppose that in a sample of 400 cases, 
there are 8 sub-groups or strata which vary in size from 70 to 25. 
Тһе M of the whole sample is 80 and c is 15. Тһе SD of the means 
of the 8 strata [by (50)] around the general mean of 80 is known to 
be 5. Substituting іп (49) we have 


со, [22525 — [200 _ „| 
Mis 400 Væ ` 


Had no account been taken of the variation in the sub-groups, бу 
would have been , [225 or 75. Unless the various strata introduce 
considerable variation, it is obvious that the correction got by using 
(49) instead of (39) is fairly small. 

(3) INCIDENTAL SAMPLING 


The term incidental sampling (also called “accidental” sam- 
pling) should be applied to those groups which are used chiefly 
because they are easily or readily obtainable. School children, col- 
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lege sophomores enrolled in psychology classes, and laboratory 
animals are available at times, in numbers, and under conditions 
none of which may be of the experimenter's choosing. Such casual 
groups rarely constitute random samples of any definable popula- 
tion. Reliability formulas apply with a high degree of approximation 
—if at all—to incidental samples. And generalizations based upon 
such data are often misleading. 


(4) PURPOSIVE SAMPLING 


A sample may be expressly chosen because, in the light of avail- 
able evidence, it mirrors some larger group with reference to a given 
characteristic. Newspaper editors are believed to reflect accurately 
public opinion upon various social and economic questions in their 
sections of the country. A sample of housewives may represent 
accurately the buyers of canned goods; a sample of brokers, the 
opinion of financiers on a new stock issue. If the saying “As Maine 
goes, so goes the Nation” is accepted as correct, then Maine becomes 
an important barometer (a purposive sample) of political thinking. 
Random sampling formulas apply more or less accurately to pur- 
posive samples. 


2. Size of sample 


The reliability of M or o depends (p. 182) upon the size of the 
sample upon which the SE is based. SE's vary inversely as the 
square root of sample size so that the larger the N in general the 
smaller the SE. A small sample is often satisfactory in an inten- 
sive laboratory study in which many measurements are taken upon 
each subject. But if N is less than 25, say, there is often little reason 
for believing such a small group of persons to be adequately descrip- 
tive of any population. 

The larger the N the larger the SD of the sample and the more 
inclusive (and presumably representative) our sample becomes of 
the general population. The range covered by samples of different 
sizes—when all are drawn from a normal population—will be 
approximately as follows: 


N=10 Range + 2.06 
N=50 Range + 2.56 
N = 200 Range + 3.06 


N = 1000 Range + 3.56 
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А range of + 3.50 from the mean includes 9995 cases in 10,000 in а 
normally distributed population. In a sample of 10,000 only 5 cases 
lie outside of this range; in a sample of 100 cases none lies outside 
of this range. Тһе more extreme the score, large or small, the less 
the probability of its oceurrence in a small sample. In fact, in very 
small samples widely deviant scores will rarely appear in a random 
sample drawn from a normal group. 

A fairly simple and practical method of deciding when a sample 
is “sufficiently large” is to increase N until the addition of extra 
cases, drawn at random, fails to produce any appreciable change 
(more than --1SEy, say) in the М and c. When this point is 
reached, the sample is probably large enough to be taken as ade- 
quately descriptive of its population. But the corollary must be 
recognized that mere numbers in and of themselves do not guarantee 
а random sample. (See also p. 114.) 


3. Sampling fluctuations and errors of measurement 


SE's measure (1) errors of sampling and (2) errors of measure- 
ment. We have already considered the question of sampling errors 
on page 185. Тһе investigator in establishing generalizations from 
his data regarding individual differences, say, must perforce make 
his observations upon limited groups or samples drawn at random 
from the population. Owing to differences among individuals and 
groups, plus chance factors (errors of measurement), neither the 
sample in hand nor another similarly drawn and approximately of 
the same size will describe the population exactly. Hence it is un- 
likely that M's and o’s from suecessive samples will equal each other. 
Variations from sample to sample—the so-called “errors” of sam- 
pling—are not to be thought of as mistakes, failures and the like, 
but as fluctuations arising from the fact that no two samples are 
ever quite alike. Means and o’s from random samples are, then, 
estimates of their parameters, and the SE formulas measure the 
goodness of this estimate. 

The term errors of measurement includes all of those variable 
factors which affect test scores, sometimes in the plus and sometimes 
іп the minus direction. If the SE, is large, it does not follow neces- 
sarily that the mean is affected by a large sampling error, as much of 
the variation may be due to errors of measurement, When errors of 
measurement are low, however (reliability of tests high, see p. 348), 
a large SE, indicates considerable sampling error, 
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4. Bias in sampling and constant errors 


Errors which arise from inadequate sampling or from bias of any 
sort are neither detected nor measured by reliability formulas. The 
mean score on an aptitude test achieved by 200 male college fresh- 
men in a college of high admission standards will not be representa- 
tive of the aptitude of the general male population between the ages 
of 18 and 21, say, and for this reason the SEx for this group is not an 
adequate measure of sampling fluctuations. College freshmen usually 
constitute an incidental—and often a highly biased—sample. In 
consequence, other samples of young men 18-25, drawn at random 
irom the male population, will return very different means and o's 
from those in our group. Differences like these are not sampling 
fluctuations but are errors due to inadequate or biased selection. 
Reliability formulas do not apply. 

SE's do not detect constant errors. Such errors work in only one 
direetion and are always plus or minus. They arise from many 
sources—familiarity with test materials prior to examination, cheat- 
ing, fatigue, faulty techniques in administering and in scoring tests, 
in fact from a consistent bias of any sort. SE's are of doubtful value 
when computed from scores subject to large constant errors. The 
careful study of successive samples, rechecks when possible, care in 
controlling conditions, and the use of objective tests will reduce 
many of these troublesome sources of error. The research worker 
cannot learn too early that even the best statistical techniques are 
unable to make bad data yield valid results. 


D 


PROBLEMS 


1. Given M = 26.40; o = 5.20; N = 100 
(a) What is the probable divergence of this M from its parameter 
(true mean) at the .01 level of confidence? 
(b) What is the probable divergence of с from its true (population) 
value at the .05 level of confidence? 
(c) Find the .99 confidence-interval for the true mean. 
2. The mean of 16 independent observations of a certain magnitude is 
100 and the SD is 24. 
(a) At the .05 confidence level what are the fiduciary limits of the true 


mean? (p. 189) 
(b) Taking the .99 confidence-interval as our standard, we may be 
assured that the true mean is at least as large as what value? 


210 


10. 


п. 
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. For a given group of 500 soldiers the mean AGCT score is 95.00 and 


the SD is 25. 


(a) Determine the .99 confidence-interval for the true mean. 
(b) It is unlikely that the true mean is larger than what value? 


. The mean of a large sample is K and ок is 2.50. What are the chances 


that the sample mean misses the true mean by more than (a) +1.00; 
(b) --3.00; (с) 210.00? 


. The following measures of perception span for unrelated words are 


obtained from 5 children: 5 6 4 7 5 

(a) Find the .99 confidence-interval for the true mean of these scores. 

(b) Compare the fiduciary limits (.99 confidence-interval) when cal- 
eulated by large sample methods with the result in (a). 


. Suppose it is known that the SD of the scores in a certain population 


is 20. How many cases would we need in a sample in order that the SE 
(a) of the sample M be 2? 
(b) of the sample SD be 1? 


In a sample of 400 voters, 50% favor the Democratic candidate for 
president. How often can we expect polls based on random samples of 
400 to return percents of 55 or more in favor of the Democrats? 


. Opinion upon an issue seems about equally divided. How large a sam- 


ple (№) would you need to be sure (at .01 level) that a deviation of 

3% in a sample is not accidental (due to chance) ? 

Given ап г of 45 based upon 60 cases, 

(a) Using formula (47), p. 197, find the SE,. Determine the limits of 
the .99 confidence-interval for the population r. 

(b) Convert the given r into z, and find с. by formula (48). Check the 
limits of the .99 confidepce-interval determined from 6, against 
those found in (a) above. 

(c) Is the given r significant at the .01 level? (Use Table 25.) 

An r of .81 is obtained from a random sample of 37 cases. 

(a) Establish the fiduciary limits of the true r at the .01 level, using the 
z-conversion. 

(b) Check the significance of r from Table 25. 


Given a sample of 500 cases in which there are six sub-groups or strata. 
The means of the six sub-groups are 50 (М = 100), 54 (N = 50), 46 
(N = 100), 50 (М = 120), 58 (N = 80), 42 (N — 50). The SD for 
the entire sample is 12. 

(a) Find the mean of the whole sample of 500 (p. 272). 

(b) Compute the oy by formula (49) (p. 206). 

(с) Compare оз by formula (39) with the result found in (b). 


12. 
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Fill in the following table: 
Size of Sample а) 


(N) (N — 2) т Significance 
(a) 15 13 —.68 
(b) 30 28 22 
(с) 82 80 —.40 
(4) 225 223 05 
ANSWERS 


. (a) We may be confident at the .01 level that the obtained M does not 


miss the ТМ by more than +1.34 (ТМ + 134). 
(b) 25.73 (To + 196 X 37). 
(с) 27.74 to 25.06. 


. (a) 112.78 and 87.22 


(b) 82.3 


. (a) 97.89 to 92.11 


(b) 97.89 


. 69 in 100; 23 in 100; less than 1 in 100 


. (a) 7.75 to 3.05 


(b) By large sample methods (352.580) fiduciary limits are 6.59 to 
421. 


. (a) 100 (Б) 202 
. About once іп 50 trials 


1850 


. (a) 72 to 18 (Б) 67 to 15 «(c) Yes 
. (a) 91 to 60 


(b) Significant at .01 level 


. (a) 50.08 (b) 495 (c) .537 vs. 495 
. (a) Significant at .01 level 


(b) Not significant 
(c) Significant at .01 level 
(d) Not significant 
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THE RELIABILITY OF THE DIFFERENCE 
BETWEEN MEANS AND OTHER MEASURES 


* 


1. The Significance of Differences between Means 
and Medians 


Suppose that we wish to discover whether ten-year-old boys and 
ten-year-old girls differ in mechanical aptitude. In attacking this 
problem, ordinarily we would first secure as large and as representa- 
tive a sample of ten-year-old boys and ten-year-old girls as possible, 
administer our mechanical aptitude tests, compute means and 6%, 
and find the difference between the two means. A large mean differ- 
ence in favor of the boys would offer strong evidence that boys of ten 
are mechanically more apt than are girls of ten. Contrariwise, à 
small difference (not more than 2-3 points, for example) would 
clearly be unimpressive, and yould suggest that further compara- 
tive tests might well show no difference at all between the two 
groups. 

When сап we feel reasonably sure that a difference is large enough 
to be taken as real and dependable? This question involves the 
reliability of the measures compared, and its answer can rarely be 
stated in unequivocal terms. Reliability, as we found in Chapter 8, 
is always relative and can be stated only in terms of probability. A 
given difference is called reliable or significant when the probability 
is high that it cannot be explained away as temporary or accidental. 
And a difference is called non-significant when it appears to be rea- 
sonably certain that it could easily have arisen from sampling fluc- 
tuations (or sampling accidents) and hence implies no “real” or true 
difference. 


212 
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1. The null hypothesis 


Experimenters have found the null hypothesis a useful tool in test- 1 
ing the reliability of differences. In its simplest form (see p. 247) , this 
hypothesis asserts that there is no true difference between two popu- 
lation means, and that the difference found between sample means is, 
therefore, aceidental and unimportant. The null hypothesis is akin 
to the legal principle that a man is innocent until he is proved guilty. 
It constitutes a challenge; and the function of an experiment is to 
give the facts a chance to refute (or fail to refute) this challenge. 
To illustrate, suppose it is claimed that Eskimos have keener vision 
than Americans. This hypothesis is vaguely stated and cannot be 
tested precisely as we do not know how much better the Eskimo’s 
vision must be before it can be adjudged “keener.” If, however, we 
assert that Eskimos do not possess keener vision than Americans, or 
that the differences are trifling and unimportant (the true difference 
being zero), this null hypothesis is exact and can be tested. If our 
null hypothesis is untenable it must be rejected. And in discarding 
our null hypothesis, what we are saying is that differences in visual 
acuity as between Eskimos and Americans cannot be fully explained 
as temporary and occasional. 


2. The reliability of the difference between two independent means 


In order to discover whether two groups differ sufficiently in mean 
performance to enable us to say with confidence that a difference will 
persist upon repetition of the experiment, we need a standard error 
of the difference between the two meaas. Two situations with respect 
to mean differences arise: those in which the means are uncorrelated 
and those in which the means are correlated. 


(1) тнв SE оғ THE DIFFERENCE (бр) WHEN MEANS ARE UNCORRE- 
LATED 
The formula for the SE of the difference between uncorrelated or 
independent means is 
бр OF Oy, — Mo = Vou Е Ou, 
or (51) 
— (бі, e£ 
бр OF би, — м: = X N 
(standard error of the difference between two uncorrelated means) 
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in which oy, is the SE of the mean of the first group; би, is the SE 
of the mean of the second group; and op is the SE of the difference 
' between the two means. Means are uncorrelated when calculated 
from different groups, or from uncorrelated tests administered to the 
same group. From formula (51) it is clear that one way to find the 
SE of the difference between two means is first to compute the SE's 
of the two means themselves. Another way is to caleulate op 
directly if ox, and oy, are not wanted. 
Application of formula (51) is illustrated by the following ex- 
ample: 


Example (1) In a study of the intelligence of the foreign-born 
white draft during World War I, a sample of 611 native-born Nor- 
wegians and a sample of 129 native-born Belgians were found to 
test as follows on the “combined scale.” * 


Country of Birth Number of Cases Mean Score в 
Norway 611 12.98 247 
Belgium 129 12.79 2.42 


Would further testing of similar samples of Norwegians and Bel- 
gians give virtually this same result; or in further testing would 
the mean difference perchance be reduced to zero, or even reversed 
in favor of the Belgians? 


To answer these questions we have first computed the SE's of the 
two means and from these the SE of the difference between the two 
means. By formula (39) the SE's of the means are 


Norwegians: oy, = = AR 1 


№611 


Belgians: би, = SEED — 2130 


32-11% 


Substituting these SE's in formula (51) we have 


9p = /(.0999)? + (:2130)? = 24 (to two decimals) 


The actual difference between the means of Norwegians and Bel- 
gians, then, is .19 (12.98 — 12.79) and the SE of this difference (65) 
is .24. In inquiring whether the two groups actually differ in mean 
performance, we shall set up a null hypothesis, namely, that the 

* The "combined scale" included the 8 Alpha tests, the Stanford-Binet, and 


tests 4, 5, 6, and 7 from Beta. Тһе maximum score was 25. For the data given 
іп this problem, see Brigham, C. C., А Study of American Intelligence (Prince- 


ton: Princeton University Press, 1923), pp. 120-121. 
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difference between the population means of Norwegians and Belgians 
is zero, and that—except for accidental errors—mean differences 
from sample to sample would all be zero. Stated specifically, we ask 
whether—in view of its SE—the mean difference of .19 is really large 
enough to cast grave doubt upon our null hypothesis. 

As a first step in making our test we compute a critical ratio, or 


: 
CR, by dividing the obtained difference by its SE (cr = 2) m 


б 

the present problem, the CR = .19/.24 or 79. The distribution of 
CR's is known to be normal around the population or true difference 
when N is large. Hence, in testing our null hypothesis, we may set 
up a normal distribution like that shown in Figure 47, in which the 
mean is set at zero (true difference) and the o of the distribution of 
differences is .24 (op). From the critical ratio our obtained differ- 
ence of .19 is seen to fall at a point .790 from the hypothetical 
mean of zero; and the difference of —.19 falls at —.790р. 


Now from Table А we know that 29% X 2 or 58% of the cases in 
а normal distribution fall between the mean and +.790p; and 42% 
of the cases fall outside these limits. This means that under the 
stipulated conditions we can expect differences as large or larger 
than +.19 to occur 42 times in 100 comparisons of Norwegians and 


* CR really equals (Mi — М) = 0 or D —0. the difference (D) between the 
с 


р бр 
two means is measured from zero in terms of op (see Fig. 47). 
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Belgians. A difference as large as +.19, therefore, might readily 
arise as a sampling fluctuation from zero and is clearly not signifi- 
cant. Accordingly, we retain the null hypothesis and conelude with 
confidence that, on the evidence, there is no real difference between 
Norwegians and Belgians on the “combined scale.” When the null 
hypothesis is retained (as here) the result may be stated also as 
follows: there is good reason to believe that these two groups were 
drawn from the same population with respect to tested intelligence 
and differ only by sampling errors. 


(2) LEVELS OF SIGNIFICANCE 


Тһе answer to the question of when a difference is to be taken as 
statistically significant depends upon the probability of the given 
difference arising *by chance" (p. 87) ; and it depends also upon the 
purposes of the experiment (p. 186). Usually a difference will be 
marked "significant" when the gap between the two sample means 
points to or signifies a true difference between the parameters in the 
population from which the samples were drawn. It would seem to be 
fairly obvious, then, that before a judgment of "significant" or “поп- 
significant" can be made, some point or points must be found along 
а probability scale which will serve to separate these two judgment 
categories. At the same time, it must be recognized that judgments 
of significance are never all-or-none but range over a wide scale of 
probabilities, our confidence increasing as the probability of error 
decreases. 

Experimenters have for convenience chosen several arbitrary 
standards—called levels of significance—of which the .05 and .01 
levels are the most often used. “Тһе .05 and the .01 significance levels 
are analogous to the .05 and .01 levels of significance used in estimat- 
ing the reliability of the mean and other statisties (Chapter 8). The 
confidence with which an experimenter rejects—or retains—a null 
hypothesis will depend upon the level of significance reached. From 
Table D we know that +1.960 mark off points іп the normal dis- 
tribution to the left and right of which lie 5% of the cases (215% at 
each end). When a CR is 1.96 or more, therefore, we reject a null 
hypothesis at the .05 level of significance—on the grounds that not 
more than once in 20 trials would a difference occur as large or larger 
than that obtained, if the true difference were zero. The CR of .79 
in the problem of Norwegians and Belgians (p. 214) falls short of 
1.96 (does not reach the .05 level of significance) and accordingly 
the null hypothesis is retained. 
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Тһе .01 level of significance is more exacting than is the .05 level. 
From Table D we know that +2.580 mark off points to the left and 
right of which lies 1% of the cases in a normal distribution. If the 
CR is 2.58 or more, therefore, we reject the null hypothesis at the 
01 level of significance, on the grounds that not more than once in 
100 trials would a difference of this size occur if the true difference 
were zero. The significance of a difference may also be evaluated by 
establishing confidence-intervals for the true difference—as was done 
for the ТМ on page 187. Thus the limits specified by D + 1.960p 
define the .95 confidence-interval for the true D; and D + 2.580p 
define the .99 confidence-interval for the true D. By way of illustra- 
tion, we may again take the problem of comparing the intelligence 
of the Norwegians and Belgians on page 214 where the D — 19 
and the op = 24. The .99 confidence-interval for the true D is 
19 + 2.58 X .24, or from —.43 to .81. This relatively wide range and 
the fact that it runs from minus to plus through zero strengthens our 
confidence in the inference that the true D could well be zero. In 
fact, acceptance of the null hypothesis always means that zero lies 
within the confidence-interval for the true difference. 


(3) TWO-TAILED AND ONE-TAILED TESTS OF SIGNIFICANCE 


Under the null hypothesis, differences between obtained means 
(ie, Mı — М») may be either plus or minus and as often in one 
direction as in the other from the true difference of zero, so that in 
determining probabilities we take bot tails of the sampling distribu- 
tion (Fig. 47). This two-tailed test, as it is sometimes called, is the 
most general test of significance. It should generally be used when, 
in accordance with the null hypothesis, our two groups have conceiv- 
ably been drawn from the same population with respect to the trait 
being measured [see Example (1) above]. 

In many experiments our primary concern is with the direction of 
the difference rather than with its existence in absolute terms.* This 
situation arises when negative differences, if found, are of no impor- 
tance practically; or when a difference if it exists at all must of neces- 
sity be positive. Suppose, for example, that we wish to determine the 
increase in vocabulary resulting from additional weekly reading 
assignments, or want to evaluate the gain in numerical computation 
brought about by an extra hour of drill per day. It is unlikely that 
additional reading will lead to an actual Joss in vocabulary. More- 


.*Jones, Lyle V., “Tests of Hypotheses: One-sided vs. Two-sided Alterna- 
tives,” Psychol. Bull., 1952, 49, 43-46. 
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over, if drill decreases arithmetic skill it would be the same as 
though it had no effect—in either event we would drop the drill. 
Only an increase as a result of drill, therefore, is of any practical 
interest. 

In cases like these the one-tailed test of significance is appropriate. 
We may illustrate with Example (2). 


Example (2) We know from experience that intensive coach- 
ing increases reading skill. Therefore, if a class has been coached, 
our hypothesis is that it will gain in reading comprehension—fail- 
ure to gain or a loss in score is of no interest. At the end of a school 
year, Class A, which had received special coaching, averaged 5 
points higher on a reading test than Class B, which had received 
no coaching. The standard error of this difference was 3. Is the 
gain significant? 


To evaluate the 5 points gained, i.e., determine its significance, we 
must use the one-tailed and not the two-tailed test. The critical ratio 
is 5/3 or 1.67, and from Table D we find that 10% of the cases in a 
normal distribution lie to the left and right of 1.65, so that 5% (P/2) 
lie to the right of 1.65. Our critical ratio of 1.67 just exceeds 1.65 and 
is, therefore, significant at the .05 level. We reject the null hypothe- 
sis, therefore, since only once in 20 trials would a gain as large or 
larger than 5 occur by chance. When a critical ratio is 2.33 (P = .02 
and Р/2 = .01) we mark a positive difference significant at the .01 
level. It may be noted that in using the one-tailed test the experi- 
menter sets up the hypothesis he wishes to test before he takes his 
data. This means that the experiment is designed at the outset to 
test the hypothesis; an hypothésis cannot be proposed to fit the data 
after they are in. If in Example (2) we had been interested simply in 
whether Class A and Class B were significantly different in reading 
score, the two-tailed test would have been appropriate. As we have 
seen, the two-tailed test gives us the probability of a mean positive 
difference of 5 points (A ahead of B), together with the probability of 
a mean negative difference (loss) of 5 points (B ahead of A). This is 
true since under the null hypothesis fluctuations of sampling alone will 
tend to show A-samples better than B-samples, and B better than A, 
about equally often. A difference in favor of either A or B, there- 
fore, is possible and equally acceptable. 

The one-tailed test should be used when we wish to determine the 
probability of a score occurring beyond a stated value. An illustra- 
tion is given in Example (3) below. 
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Етатріе (8) In certain studies of deception among school chil- 
dren the scores achieved on tests given under conditions in which 
cheating was possible were compared with scores achieved by com- 
parable groups under strictly supervised conditions. In a certain 
test given under “honest” conditions the mean is 62 and the o is 
10. Several children who took the test under non-supervised соп- 
ditions turned in scores of 87 and above. Is it probable that these 
children cheated? 


The mean of 62 is 24.5 score units from 86.5, the lower limit of 
score 87. Dividing 24.5 by 10 we find that scores of 87 and above 
lie at the point 2.450 above the mean of 62. On the assumption of 
normality of distribution, there is less than one chance in 100 that 
a score of 87 or more will appear in the “honest” distribution. While 
scores of 87 and above might, of course, be “honest,” examinees who 
make such scores under non-supervised conditions are certainly open 
to suspicion of having cheated. The one-tailed test is appropriate 
here as we are concerned only with the positive end of the distribu- 
tion—the probability of scores of 87 and above. 


(4) ERRORS IN MAKING INFERENCES 


In testing hypotheses two types of wrong inference can be made 
and must be reckoned with by the experimenter.* What are called 
Type I errors are present when the hypothesis is true but our test of 
significance leads us to believe it to be false; Type II errors arise 
when the hypothesis is false, but our test of significance leads us to 
believe it to be true. Stated in different terms, we make an error of 
Type I if we reject the null hypothesis when it is true—claim signifi- 
cance when none exists; and we commit an error of Type II if we 
accept the null hypothesis when it is false—mark a finding not-sig- 
nificant when a real difference is present. 

Various precautions must be taken to avoid both sorts of erroneous 
inference. A low significance level (P greater than .05, say) increases 
the possibility of Type I errors; and a high significance level (.05 to 
01) renders such erroneous inferences less likely. How this works 
out can perhaps be shown best by a simple example. Suppose that a 
quarter known to us to be a good coin is suspected by an experi- 
menter of a bias in favor of heads.j When our experimenter tosses 


* Treloar, Alan E., Elements of Statistical Reasoning (New York: Wiley and 
Sons, 1939), Chap. 10, pp. 149-151. 
мораа Q., Psychological Statistics (New York: Wiley and Sons, 1949), 
pp. 1. 
. t If a coin is “leaded” or weighted on the "tails" side, the "heads" side, being 
lighter, will tend to appear more often than tails. 
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this coin 10 times, it turns up 8 heads and 2 tails. The theoretical 
expectation for a good coin is, of course, 5 heads and 5 tails; and the 
specific question for the experimenter to decide is whether the occur- 
rence of 8 heads represents a “heads” bias—a significant deviation 
from the expected 5 heads. The distribution of heads and tails ob- 
tained when a single coin is tossed 10 times is given by expansion 
of the binomial (p+ q)!°, where р = probability of a head and 
q = probability of a tail (non-head). The mean of (p+ p)" is np 
and the SD is \/npq; hence іп our example the mean is 5 and the SD 
is V/10-1/2-1/2 or 1.58. A “score” of 8 extends over the interval 
7.5-8.5, so that to determine the probability of 8 or more, the CR we 


wish is iud or 1.58. (See Fig. 48.) (A problem similar to this 


will be found on p. 252). From Table А we know that 8 or more 
heads, that is, а ОЁ of 1.58, may be expected on the null hy- 
pothesis approximately 6 times in 100 trials.* If our experi- 
menter is willing to accept Р = .06 as significant (i.e., set his stand- 
ards low), he will reject the null hypothesis—although it is true. 
That is, he will report the coin to be biased in favor of heads, 
although it is in fact а good coin. 

1f our experimenter had set his significance level higher (say .01 or 
even .05) he would have avoided this erroneous inference. Further- 


* This is а one-tailed test (p. 217) because our experimenter's hypothesis was 
that the coin is biased in favor of heads. 
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more, had he increased the number of tosses of the coin from 10 to 
100 or even 500, he might have avoided his wrong inference, as heads 
and tails in a good coin will tend to occur equally often. Increas- 
ing the experimental data gives the null hypothesis a chance to assert 
itself (if true) and guards against freak results. We should not be 
willing to reject a null hypothesis too quickly, as in so doing we 
must assert the existence of a real difference—often a heavy 
responsibility. 

In direct contradiction to what happens in the сазе of Type I 
errors, the possibility of drawing erroneous inferences of Type П 
(acceptance of the null hypothesis when false) is increased when we 
set very high levels of significance. This can be shown by reference 
to the coin example above—with a change in conditions. Suppose 
that a quarter which is known to us to be biased in favor of heads is 
also suspected by an experimenter of bias in favor of heads. This 
coin is tossed 10 times and shows, as did the coin before, 8 heads and 
2 tails. From the data above, on page 220, we know that in a good 
coin 8 or more heads can be expected by chance 6 in 100 times—that 
Р = 06. Hence, if our experimenter sets .01 as his level of signifi- 
cance (or even .05) he will accept the null hypothesis and mark 
his result “not significant” although the coin is now actually 
biased. 

How can we guard against both of these types of erroneous infer- 
ence? Perhaps the wisest course is first to demand more evidence, 
that is, give the data a chance to refute (or fail to refute) the null 
hypothesis. Additional data, further repetition of the experiment, 
and better control will often make possible a definite conclusion. If 
а coin is biased toward returning heads, this bias will continue to 
cause more heads than tails to appear in further tosses. For example, 
if the ratio of 8 heads to 2 tails in the 10 tosses described in the last 
paragraph holds consistently, we shall get 80 heads and 20 tails in 
100 throws. The critical ratio for 100 tosses will be 5.9 * (as com- 
pared with 1.58 for 10 tosses), and the probability is far less than .01 
that 80 heads is a random fluctuation from the expected 50 heads. 
Our experimenter would correctly mark this result very significant— 
i.e., significant beyond the .01 level. 

* When n = 100, р = .50, 4 = 50: 

M =np=50 
о = Vnpq = V100 X 1/2 X 12-5 
ов = 195—0 50 = 59 
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Setting а high level of significanee will tend, then, to prevent 
Туре I errors but will encourage the appearance of Туре II errors. 
Hence it appears that an experimenter must decide which kind of 
wrong inference he would rather avoid, as apparently he can pre- 
vent one type of error only at the risk of making the other more 
likely. In the long run, errors of Type I (rejecting a null hypothesis 
when true, by marking a non-significant difference significant) are 
perhaps more likely to prove serious in a research program in psy- 
chology than are errors of Type II. If an experimenter claims a 
significant finding erroneously, for instance, the fact that it is a 
positive result is likely to terminate the research, so that the error 

_ persists. When a high level of significance is demanded (.01, say) 
we may feel assured that significance will be claimed incorrectly not 
more than once in 100 trials. 

Errors of Type II (accepting the null hypothesis when false, i.e., 
when a real difference exists) must be watched for carefully when 
the experimental factor or factors are potentially dangerous. Thus, 
if one is studying the psychological effects of a drug suspected of 
inducing rather drastic emotional and temperamental changes, an 
error of Type II might well prove to be disastrous. Fortunately, the 
fact that a negative finding is inconclusive and often unsatisfactory 
may lead to further experimental work, and thus obviate somewhat 
the harm done by Type II errors. Especially is this true when the 
problem is important enough further to challenge the investigator. 

For many years it was customary for investigators in experimental 
psychology to demand critical ratios of 3.00 or more before marking 
a difference significant. This extremely high standard almost cer- 
tainly caused the null hypothesis to be accepted more often than it 
should have been—a Type II error on the side of conservatism. As 
a general rule it is probably wise to demand a significance level of at 
least .01 in most experimental research, i.e., to risk Type II errors 
by preventing those of Type I. But the .05 level is often satisfactory, 
especially in preliminary work. 


(5) RELIABILITY OF THE DIFFERENCE BETWEEN MEANS IN SMALL 
INDEPENDENT SAMPLES 


When the N’s of two independent groups are small (less than 30, 
say) the SH of the difference between means should depend upon 
SD’s calculated by the formula SD = 6 and the degrees of 
freedom in the two groups must be considered. Table D may then 
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be used conveniently to test the significance of t,* which is the ap- 
propriate critical ratio to be used with small samples. An example 
will illustrate the procedures. 


Ezample (4) 


Scores are as follows: 
Vocational Class 


Ап interest test is administered to 6 boys іп а 
Vocational class and to 10 boys in a Latin class. Is there a sig- 
nificant difference in mean score between the two groups? 


N,—6 
Scores (Xi) д, 
28 —2 
35 5 
32 2 
24 —6 
26 —4 
35 5 
6|180 
М; = 80 
N,—1-2 5 
М„—1= 9 


14 


Latin Class 

М,-10 

212 Scores (Xa) т» 
4 20 —4 
25 16 —8 
4 25 1 
36 84 10 
16 20 —4 
25 28 4 
110 31 7 
24 0 

27 3 

15 —9 

10 | 240 
М, = 24 


SD (ors) = pore = 5.74 


SDp = 574 |9 = 574X.5168 — 296 Бу (53) 


(30 —24) — 0 


= 9. 
поа 


Ъу (52) 


For 14 df, the .05 level (Table D) is 2.14; and the .01 level is 2.98. 


Тһе mean of the interest scores made by the 6 boys in the Voca- 
tional elass is 30, and the mean of the interest scores made by the 
10 boys in the Latin elass is 24. The mean difference of 6 is to be 
tested for significance. When two examples are small, as here, we 
get a better estimate of the “true” SD (о in the population) by pool- 


*¢ is a critical ratio in which a more exact estimate of the ор is used. The 
sampling distribution of ¢ is not normal when N is small (less than 50, say). 
tis a CR; but all CR’s are not t’s (see p. 215). 
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ing the sums of squares of the deviations taken around the means of 
the two groups and computing a single SD.* Тһе justification for 
pooling is that under the null hypothesis no real mean difference ex- 
ists as between the two samples, which are assumed to have been 
drawn from the same parent population. We have, therefore, only 
one c (that of the common population) to estimate. Furthermore, by 
increasing N we get a more stable SD based upon all of our cases. 
Тһе formula for computing this *pooled" SD and the formula for the 
SE of the difference are as follows: 


(X1 — M)? + Z(Xs — M3)? 
(Ni — 1) + (Ма — 1) 
(SD when two small independent samples are pooled) 


= sp, ME: 
SE» = 803 Ns (53) 


(SE of the difference between means in small independent samples) 


In formula (52), У(Х, — M;)? = Ez? is the sum of the squared 
deviations around the mean of Group 1; and X(X; — M;)? = Ez? 
is the sum of the squared deviations around the mean of Group 2. 
These sums of squares are combined to give a single SD. In Exam- 
ple (4) the sum of squares in the Vocational class around the mean 
of 30 is 110; and in the Latin class the sum of squares around the 
mean of 24 is 352. The df are (№, — 1) = 5, and (№ — 1) = 9. By 


formula (53), therefore, the SD — поз ог 5.74. This SD 


SD= 


(52) 


serves as a measure of variability for each of the two groups. Thus 
the SEx, = 574 and the SE, = = [by formula (39), p. 182]. 
Combining these two SE's by formula (51) we find that SE, = 
JEF Me (5.74)? _ 54, 16 ог 2.96. Formula (53) combines the 


10 
two Эи enabling us to calculate SE; in one operation. 


e 2560: 2.03; and the df in the two groups (namely, 5 and 9) 


are combined to give 14 df for use in inferring the significance of the 


* The SD so computed is subject to a ае negative bias, which is negligible 
when N > 20, say. See Holtzman, W. H., “The Unbiased Estimate of the Pop- 
Шор. Variance and Standard Deviation, ? Amer. Jour. Psychol., 1950, 63, 

11 df is "used up" in computing each mean (p. 193). 
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mean difference. Entering Table D with 14 df, we get the entries 
2.14 at the .05 and 2.98 at the .01 levels. Since our t does not reach 
the .05 level, the obtained mean difference of 6 must be marked 
“non-significant.” 

А second example will illustrate further the use of levels of sig- 
nificance when samples are small. 

Example (6) On an arithmetic reasoning test 31 ten-year-old 
boys and 42 ten-year-old girls made the following scores: 


Mean SD N 
Boys: 40.39 8.69 31 
Girls: 35.81 8.33 42 


Is the mean difference of 4.58 in favor of the boys significant? 
By formula (52) we find 


ea 
sp*= axa ee or 8.48. 


And by formula (53), 


31 + 42 
31 х 42 


t is 4.58/2.01 or 2.28 and the degrees of freedom for use in testing the 
significance of the mean difference are 30+ 41 or 71. Entering 
Table D with 71 df we find t-entries of 2.00 at the .05 and of 2.65 at 
the .01 levels. The obtained t of 2.28 is significant at the .05 but not 
at the .01 level. Only once in 20 comparisons of boys and girls on 
this test would we expect to find a difference as large or larger than 
4.58 under our null hypothesis. We may be reasonably confident, 
therefore, that boys do better than girls on this test. 


= 2.01. 


SEp = 8.48 


3. The reliability of the difference between two correlated means 


(1) THE SINGLE GROUP METHOD 


The last section dealt with the problem of determining whether the 
difference between two means is significant when these means repre- 
sent the performance of independent groups—boys and girls, Nor- 
wegians and Belgians, and the like. A closely related problem is con- 
cerned with the significance of the difference between correlated 
means obtained from the same test administered to the same group 
2 


x 
*8р*= cy үру hence Ха? = SD? X (N — 1). 
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upon two occasions. This experimental design is called the “single 
group" method. Suppose that we have administered а test to а 
group of children and two weeks later have repeated the test. We 
wish to measure the effect of practice or of special training upon the 
second set of scores; or to estimate the effects of some activity inter- 
polated between test and retest. In order to determine the signifi- 
cance of the difference between the means obtained in the initial and 
final testing, we must use the formula 


SE» = Voy, + os — 271261, OM, (54) 


(SE of the difference between correlated means) 


in which cy, and oy, are the SE's of the initial and final test means, 
and r4? is the coefficient of correlation between scores made on initial 
and final tests.* An illustration will bring out the difference between 
formula (51) and formula (54). 


Example (6) At the beginning of the school year, the mean 
score of a group of 64 sixth-grade children upon an educational 
achievement test in reading was 45.00 with a o of 6.00. At the end 
of the school year, the mean score on an equivalent form of the 
same test was 50.00 with a o of 5.00. The correlation between 
scores made on the initial and final testing was .60. Has the class 
made significant progress in reading during the year? 


We may tabulate our data as follows: 


Initial Final 
Test Test 
No. of children: 64 64 
Mean score: 45.00 (M,) 50.00 (My) 
Standard Deviations: 6.00 (,) 5.00 (со) 
Standard errors of means: 75 (би) 63 (см) 
Difference between means: 5.00 
Correlation between initial and final tests: 60 


Substituting іп formula (54) we деб 
SE, = \/'(Л5)* + (63)? = 2X 60 X 75 X 68 = 63 


Тһе t-ratio is 5.00/.63 or 7.9. Since there аге 64 children there are 
64 pairs of scores and 64 differences, so that the df becomes 64 — 1 or 


,* The correlation between the means of successive samples drawn from a 
given population equals the correlation between test scores, the means of which 
are being compared. 


‚11 df is lost since SE» is computed around the mean of the distribution of 
differences (p. 193). 


| 
| 
| 
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63. From Table D the t for 63 df is 2.66 at the .01 level. Our t of 7.9 
is far greater than 2.66 and hence is very significant. Tt seems clear, 
therefore, that this class made substantial progress in reading over 
the school year. 

When groups are small, a procedure called the *difference-method" 
is often to be preferred to that given above. The following example 
will serve as an illustration: 


Example (7) Twelve subjects are given 5 successive trials upon 
a digit-symbol test of which only the scores for trials 1 and 5 are 
shown. Is the gain from initial to final trial significant? 


Difference 


i i 2 
Trial 1 Trial 5 (Б—1) т т 
50 62 12 4 16 
42 40 — 2 —10 100 
51 61 10 2 4 
26 35 9 1 1 
35 30 — 5 —13 169 
42 52 10 2 4 
60 68 8 0 0 
41 51 10 2 4 
70 84 14 6 36 
55 63 8 0 0 
62 72 10 2 4 
38 50 12 4 _16 
572 668 12|96 354 
8 
Меапр = 80 
354 
SDp = „|= = 5.07 
Pa Nit 
5.67 "d 
ир — дд = 
= ceca — 488 
1.64 


From the column of differences between pairs of scores, the mean 
difference is found to be 8, and the SD around this mean (SDp) by 


the formula SD = = is 5.67. On our null hypothesis the true 


difference between the means of Trials 5 and 1 is 0, so that we must 
test our obtained mean gain of 8 against this hypothetical zero gain. 
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The SE of the mean difference( SE; = сі is 1.64 and (ағы) 
D 


is 4.88. Entering Table D with 11 (12 — 1) degrees of freedom, we 
find t-entries of 2.20 and 3.11 at the .05 and at the .01 levels. Our £ 
of 4.88 is far above the .01 level and the mean difference of 8 is obvi- 
ously very significant. 

If our hypothesis initially had been that practice increases test 
score, we would have used the one-tailed test. The probability of а 
positive difference (gain) of 8 or more on the null hypothesis is quite 
remote. In the one-tailed test, for 11 df the .05 level is read from the 
110 column (P/2 — .05) to be 1.80 and the .01 level from the .02 
column (P/2 — .01) is 2.72. Our t of 4.88 is much larger than the .01 
level of 2.72 and there is little doubt but that the gain from Trial 1 to 
Trial 5 is significant. 

The result found in Example (7) may be checked by the single 
group method. By use of formula (27), p. 145, the r between Trials 1 
and 5 is found to be .944. Substituting for rig (viz., .944), for бм, 
(3.65) and for ом, (4.55) in formula (54) we get a op of 1.63 which 
checks SEy, within the error of computation. The “difference- 
method" is quicker and easier to apply than is the longer method of 
calculating SE's for each mean and the SE of the difference, and is 
to be preferred unless the correlation between initial and final seores 
is wanted. 


(2) THE METHOD OF EQUIVALENT GROUPS: MATCHING BY PAIRS 


Formula (54) is applicable in those experiments which make use 
of equivalent groups as well as in those using a single group. In the 
method of equivalent groups the matching is done initially by pairs 
80 that each person in the first group has a match in the second group. 
"This procedure enables us to set off the effects of one or more experi- 
mentally varied conditions (experimental factors) against the ab- 
sence of these same variables (control). The following problem is 
typical of many in which the equivalent group technique is useful. 


Example (8) Two groups, X and Y, of seventh-grade children 
are paired child for child for age and score on Form A of the Otis 
Group Intelligence Scale. Three weeks later, both groups are given 
Form B of the same test. Before the second test, Group X, the ex- 
perimental group, is praised for its performance on the first test and 
urged to try to better its score. Group Ү, the control group, is given 
the second test without comment. Will the incentive (praise) cause 
the final scores of Group X and Group Y to differ significantly? 
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Experimental Control 
Group X Group Y 
Хо. of children in each group: 72 72 
Mean scores on Form A, initial test: 80.42 80.51 
SD on Form A, initial test: 23.61 23.46 
Mean scores on Form B, final test: 8863 (M) 8324 (М.) 
SD on Form B, final test: 2436 (o) 2162 (оз) 
Gain, Mi — Ms: 5.39 
Standard errors of means, final tests: 289 А 257 


Correlation between final scores (experimental and control groups) = 65 


The means and o's of the control and experimental groups in 
Form A (initial test) are almost identical showing the original pair- 
ing of scores to have been quite satisfactory. The correlation be- 
tween the final scores on Form B of the Otis Test is calculated from 
the paired scores of children who were matched originally in terms of 
initial score.* 

The difference between the means on the final test is 5.39 
(88.63 — 83.24). The SE of this difference, бр, is found from formula 
(54) to be 


op = (2.89)? + (2.57)? — 2 X .65 2.89 X 2.57 = 2.30 


The t-ratio is 5.39/2.30 or 2.34; and since there are 72 pairs, there are 
(72 — 1) or 71 degrees of freedom. Entering Table D with 71 df we 
find the Р at .05 and .01 to be 2.00 and 2.65, respectively. The given 
difference is significant at the 05 but not at the 01 level; and we may 
feel reasonably certain that the experimental and control groups 
differ in their final mean scores on Form B of the Otis Test. 

It is worth noting that had no account been taken of the correla- 
tion between final scores on Form B [if formula (51) had been used 
instead of (54)], со would have been 3.87 instead of 2.30. % would 
then have been 1.39 instead of 2.34 and would have fallen consider- 
ably below the .05 level of 2.00. Tn other words, a significant finding 
would have been marked “not significant.” Evidently, it is impor- 
tant that we take account of the correlation between final scores— 
especially if it is high. 

When т = .00, formula (54) reduces to (51) since group means 
are then independent or uncorrelated. Also, when r is positive, the 
op from formula (54) is smaller than the op from (51) and the larger 


* Note that the correlation between final scores in the equivalent group: 


8 
method is analogous to the correlation between initial and final scores in the 
single group method. In equivalent groups one group 1s the experimental and 


the other the control. In the single group, the initial scores furnish the control. 
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the plus r the greater the reduction in ор by use of (54). For a given 
difference between means, the smaller the op the larger the ¢ and the 
more significant the obtained difference. The relative efficiency ob- 
tained by using a single group or equivalent groups as compared with 
independent groups can be determined by the size of the r between 
final scores, or between initial and final scores. The correlation соећ- 
cient, therefore, gives a measure of the advantage to be gained by 
matching. 

If r is negative, formula (54) gives a larger op than that given by 
formula (51). In this case, the failure to take account of the correla- 
tion will lead to a smaller op and a ¢ larger and apparently more 
significant than it should be. 

One further point may be mentioned. If the difference between 
the means of two groups is significant by formula (51) it will, of 
course, be even more significant by formula (54) if r is positive. 
Formula (51) may be used in a preliminary test, therefore, if we can 
be sure that the correlation is positive. The correlation between 
initial and final score is usually positive, though rarely as high as 
that found in Example (8). 


(3) GROUPS MATCHED FOR MEAN AND SD 


When it is impracticable or impossible to set up groups in which 
subjects have been matched person for person, investigators often 
resort to the matching of groups in terms of mean and c. The match- 
ing variable is usually different from the variable under study but is, 
in general, related to it and sometimes highly. No attempt is made 
to pair off individuals and the two groups are not necessarily of the 
same size, although a large difference in N is not advisable. 

In comparing final score means of matched groups the procedure 
is somewhat different from that used with equivalent groups.* Sup- 
pose that Х is the variable under study, and Y is the function or 
variable in terms of which our two groups have been equated as to 
mean and SD. Then if ry, is the correlation between X and Y in the 
population from which our samples have been drawn, the SE of the 
difference between means in X is 


SE», us c орге (ou, n oua, ) (1 = еа) (55) 


(SE of the difference between the X means of groups matched 
for mean and for SD in Y) 


* Wilks, S. S., "The Standard Error of the Means of ‘Matched’ Samples,” 
Jour. Educ. Psychol., 1931, 22, 205-208. 
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Example (9) Тһе achievement of two groups of first-year high- 
school boys, the one from an academic, the other from a technical 
high school, is compared upon a Mechanical Ability Test. The two 
groups are matched for mean and SD upon a general intelligence 
test so that the experiment becomes one of comparing the mechani- 
cal ability scores of two groups of boys of “equal” general intelli- 
gence enrolled in different curricula. Data are as follows: 


Academic Technical 


No. of boys in each group: 125 137 

Means on Intelligence Test (Y): 102.50 102.80 
o's on Intelligence Test (Y): 33.65 31.62 
Means on Mechanical Ability Test (X): 5142 5438 
св on Mechanical Ability Test (X): 6.24 7.14 


Correlation between the General Intelligence Test and the Mechanical Ability 
Test for first-year high-school boys is .30. 


М» — М» = 54.38 — 5142 = 2% 
By (55) o» = (E Ет апат — 302) 


125 137 
= 19 
2.96 
= 296 — 375 
та 


Тһе difference between the mean scores in the Mechanical Ability 
Test of the academic and technical high-school boys is 2.96 and the 
ор is .79. The t is 2.96/.79 or 3.75; and the degrees of freedom to be 
used in testing this ¢ are (125 — 1) + (137 — 1) — 1, or 259." We 
must subtract the one additional df to allow for the fact that our 
groups were matched in variable Y. The general rule (p. 193) is that 
1 df is subtracted for each restriction imposed upon the observations, 
i.e., for each matching variable. 

Entering Table D with 259 df, we find that our t of 3.75 is larger 
than the entry of 2.59 at the .01 level. The obtained difference in X 
(mechanical ability), therefore, though small, is highly significant, 
and boys in the technical high school are reliably better оп the 
Mechanical Ability Test than are boys of “equal” general intelligence 
in the academic high school. 

The correlation term must be introduced into formula (55) because 
when two groups have been matched in some test or tests their vari- 
ability is restricted in all functions correlated with the matching 
variable. Height and weight, for example, are highly correlated in 
9-year-old boys. Therefore, if a group of 9-year-old boys of the same 


* When df = 259, little is to be gained by using ¢ instead of CR. 
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or nearly the same height is selected, the variability in weight 
of these children will be substantially reduced as compared with 
9-year-old boys in general. When groups are matched for several 
variables, e.g., age, intelligence, socioeconomic status, and the like, 
and compared with respect to some correlated variable, the correla- 
tion coefficient in formula (55) becomes a multiple coefficient of 
correlation (p. 395). When re, = .00, (55) reduces to (51)—our 
groups are independent and unrestricted by the matching variable. 

Groups matched for mean and o and equivalent groups in which 
individuals are paired as to score have been widely used in a variety 
of psychological and educational studies. Illustrations are found in 
experiments designed to evaluate the relative merits of two methods 
of teaching, the effects of drugs, e.g., tobacco or caffeine, upon ећ- 
ciency, transfer effects of special training, and the like. Other tech- 
niques useful in assessing the role of experimental factors are 
described in Chapter 10. 


4. The reliability of the difference between uncorrelated medians 


The reliability of the difference between two medians obtained 
from independent samples may be found from the formula 


ODyran OT Омат, — Mang = Уман; + 0 Mans (56) 
(SE of the difference between two uncorrelated medians) 


When medians are correlated, the value of т» cannot be deter- 
mined accurately and the reliability of the median cannot be readily 
computed. When samples are not independent, therefore, it is better 
procedure to use means instead of medians. 


11. The Significance of the Difference between o's 


1. The reliability of the difference between two standard deviations 


(1) SE оғ A DIFFERENCE WHEN 6's ARE UNCORRELATED 


In many studies in psychology and education, differences in vari- 
ability which appear among groups are a matter of considerable 
importance. The student of race, sex, and experimentally induced 
differences is oftentimes more interested in knowing whether his 
groups differ significantly in SD than in knowing whether they differ 
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in mean achievement. And the educational psychologist who is in- 
vestigating a new way of teaching arithmetie may want to know 
whether the new method has led to changes in variability greater 
than those brought about by the old method. 

When samples are independent, ie. when different groups are 
studied, or when tests given to the same group are uncorrelated, the 
reliability of a difference between two o's may be found thus: 


Op, OF 9, =f = 1%, + в. (57) 
(SE of the difference between uncorrelated o's when № are large) 
where o, is the SE of the first о and c, is the SE of the second o 
1 2 


(p. 195). 

By way of illustration, we may apply this formula to the data of 
the Norwegians and Belgians on page 214. The c of the Norwegians' 
scores on the combined scale was 2.47; of the Belgians' scores on the 
same test, 2.42. Is this very small difference in variability signifi- 
cant? Calling the o of the Norwegians’ scores o; and the c of the 
Belgians’ scores оз, we have 


б, ш "АТАП | = 071 by (43) 
171х242 _ а 
wc Жарты. 


op, = /(071)? + (151)? = .167 or .17 (to two decimals) 


The obtained difference in the o’s is .05 (2.47 — 2.42), and CR is 
.05/.17 or .30. On the null hypothesis (01 — 02 = 0), this CR (Table 
D, last line), is far short of 1.96, the .05 level. As we suspected, the 
obtained difference is elearly not significant; and there is no reason 
to suspect that the two groups are not about equally variable. 

Formula (57) is adequate for testing the significance of the differ- 
ence between two uncorrelated SD's when N's are large (greater 
than 50, say). But formula (57) is not accurate when N's are small, 
as the SD's computed from small samples drawn at random from 
the same normal population will exhibit a skewed distribution 
around the population c. (See Figure 47 for normal sampling dis- 
tribution of means.) Instead of testing the difference between two 
SD's obtained from small independent samples, therefore, by 
formula (57) we divide the larger of the two variances (SD?) by the 
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smaller and test the significance of this ratio, called F,* by the one- 
tailed test. We then double the probability (P) so found, in order to 
test the general (null) hypothesis, namely, that the two variances do 
not differ. 

We may illustrate the method of using the F-ratio with Example 
(4), page 223, in which N, = 6 and №, = 10, and the sums of squares 
around the two means are Ez,? = 110 and. Ez? = 352, respectively. 
The first variance (viz, SD,*) is 110/5 or 22; and the second vari- 
ance(SD:°) is 352/9 or 39.1. The F-ratio found by dividing the 
larger by the smaller variance is, then, 39.1/22 or 1.78; and entering 
Table Е with n, —9 (df of larger variance) and т = 5 (df of 
smaller variance), we get the two entries 4.78 and 10.15. As given in 
the table, the first of these is the F-ratio significant at the .05 level, 
and the second is the F-ratio significant at the .01 level. However, 
since we have used the one-tailed test (have divided only the larger 
variance by the smaller), these two F-ratios, viz., 4.78 and 10.15, 
really represent the .10 and the .02 levels of confidence (see p. 217). 
Our Р of 1.78 falls far below the smaller of these values (namely, 
4.78) and hence is not significant at the .10 level, much less at the 
05 or .01 levels. There is no evidence, therefore, that the two groups 
really differ with respect to variability. 


(2) SE or А DIFFERENCE WHEN 678 ARE CORRELATED 


When we compare the o’s of the same group upon two occasions or 
the o's of equivalent groups on a final test, we must take into account 
possible correlation between the o’s in the two groups being com- 
pared. The formula for testing the significance of an obtained dif- 
ference in variability when SD’s are correlated is 


ee ee 
| 2 2-09 
9p, = 40%, F0 272" 129,6, (58) 


(SE of the difference between correlated 69's when N’s are large) 
where o, and б. are the SE's of the two SD's and 75 is the square 


1 
of the coefficient of correlation between scores in initial and final 
tests or between final scores of equivalent groups. 
Formula (58) may be applied to the problems on page 226 by 
ous pages 278-281 for explanation of the F-ratio; and page 429 for the table 
of F. 
t The correlation between the SD's of samples drawn from a given popula- 


tion equals the square of the coefficient of correlation between the test scores, 
the SD's of which are being compared. 
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way of illustration. In the first problem, the SD of 64 sixth-grade 
children was 6.0 on the initial and 5.0 on the final test. Is there a 
significant drop in variability in reading after a year's schooling? 
Putting o; = 6.0 and o2 = 5.0, we have 


412560)” 


5, B XA = 53 by (48) 
415.0 
б, = ETUR 44 


Тһе coefficient of correlation between initial and final scores is .60, 
so that 75,9 = .36. Substituting for r? and ће o,’s in formula (58) 
we have 


ор, = V/ C58)? F (44)? — 2X 36 X 53 X 44 = .55 


The difference between the two o’s is 1.0 and the SE of this difference 
(6—5) —0 
55 
or 1.80. Entering Table D with 63 df, we find ¢ at the .05 level to be 
2.00. The obtained t does not quite reach this point, and there is no 
reason to suspect a true difference in variability between initial and 

final reading scores. 

In the equivalent groups problem on page 228, the SD of the experi- 
mental group on the final test was 24.36 and the SD of the control 
group on the final test was 21.62. The difference between these 
SD's is 2.74 and the number of children in each group is 72. Did 
the incentive (praise) produce significantly greater variability in the 
experimental group as compared with the control? Putting 
бі = 24.36, and б» = 21.62, we have 


is .55. Therefore, on the null hypothesis of equal o’s, t = 


171 24.36 

e mI c by (43) 
71 X 21.62 

em тіз 


Тһе т between final test scores in the experimental and control 
groups is .65 and 1712, therefore, is .42. Substituting for 7? and the 
two SE's in formula (58) we have 

ор, = V 2.04)? F (1.81)? — 2X 42 2.04 X 1.81 


= 2.08 
Dividing 2.74 by 2.08, our t is 1.32; and for 71 degrees of freedom 
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this t falls well below the .05 level of 2.00. There is no evidence, 
therefore, that the incentive increased variability of response to the 
test. 


111. The Significance of the Difference between Percentages 
and Correlation Coefficients 


1. The reliability of the difference between two percents 


(1) SE or THE DIFFERENCE WHEN PERCENTS ARE UNCORRELATED 
On page 196, the formula for the SE of a percentage was given as 


SE, — 9 where Р = percent occurrence of the observed behavior, 


Q — (1— P), and N is the size of the sample. One of the most useful 
applications of the SE formula is in determining the significance of 
the difference between two percents. In much experimental work, 
especially in social and abnormal psychology, we are able to get the 
percent occurrence of a given behavior in two or more independent 
samples. We then want to know whether the incidence of this be- 
havior is reliably different in the two groups. The following problem 
which repeats part of Example (1), page 196, will provide an illustra- 
tion. 


Example (1) Ina study of cheating * among elementary-school 
children, 144 or 41.4% of 348 children from homes of good socio- 
economie status were found to have cheated on various tests. In 
the same study, 133 or 50.2% of 265 children from homes of poor 
socioeconomic status also cheated on the same tests. Is there a 
true difference in the incidence of cheating in these two groups? 


Let us set up the hypothesis that no true difference exists as 
between the percentages cheating in the two groups and that, with 
respect to cheating, both samples have been randomly drawn from 
the same population. A useful procedure in testing this null hypoth- 
esis is to consider P, (41.4%) and Р, (50.2%) as being inde- 
pendent determinations of the common population parameter, P; and 
to estimate P by pooling Рі and Р» (see p. 224). A pooled estimate 
of P is obtained from the equation: 


Q being, of course, (1 — P). 


* Data from Hartshorne, H., and May, M. А., Studies іп Deceit (New York: 
Macmillan, 1928), Book II, р. 161. 
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The estimated percentages, P and Q, may now be put in formula 
(59) to give the SE of the difference between Р, and Р». 


dp, — Gp сг. = Vo", ЕВ ou (59) 


(SE of the difference between two uncorrelated percentages) 
348 X 41.4 + 265 X 50.2 or 452% 
348 + 265 


and Q — (1 — P) or 54.896. Substituting these two values in (59) 
we get 


In the present example, P — 


ору—г = 4/452 X 548 Ес + | = 406% 


348 265 

The difference between the two percents P; and P» is 8.8% 
(50.2 — 41.4); and dividing by 4.06 (св = Еш Еа) сс) e: Бикә $i we get à 

P1—P3 

CR of 2.17. Entering Table D, last line (there are 611 df), we find 
that our CR exceeds 1.96 (.05 level) but does not reach 2.58 (.01 
level). We ean be reasonably confident, therefore, that our two 
groups do not come from a common population and that the occur- 
rence of cheating in the two groups is reliably different. 


(2) SE or THE DIFFERENCE WHEN PERCENTS ARE CORRELATED 
Responses recorded in percentages may be, and usually are, corre- 
lated when individuals have been paired or matched in some at- 
tribute; or when the same group gives answers (e.g., ^Yes"—"No") 
to the same questions or items. To illustrate with an example: 
Example (2) A large group of veterans (250 *) answered as fol- 
lows the two questions: 
1. Do you have a great many bad headaches? Yes 150 No 100 
2. Are you troubled with fears of being crushed 


in a crowd? Yes 125 No 125 
#1 #1 
No Yes 
Yes 25 125 
#2 
No 75 125 
100 150 250 40% 60% 100% 


* The data have been simplified for illustrative purposes. 
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The data in the 22 table on the left show the number who 
answered “Yes” to both questions, “Хо” to both questions, “Yes” to 
one and “No” to the other. In the second diagram (on the right) 
frequencies are expressed as percents of 250. The letters а, Ь, с, and d 
are to designate the four cells (p. 363). We find that a total of 60% 
answered “Yes” to Question 1, and that a total of 50% answered 
“Yes” to Question 2. Is this difference between the questions sig- 
nificant? 

The general formula for the significance of the difference between 
two correlated percents is 


E ARE = eye 
Op, — P4 = моь, Tc Pa — 2rpipsOp, Әрә (60) 
(SE of difference between two correlated percents) 


in which ғ between the two percents is given by the phi-coefficient 
(р. 367), а ratio equivalent to the correlation coefficient in 2 X 2 
tables. 

If P, and P, have been averaged in order to provide an estimate of 
P, the population parameter, formula (60) becomes 


Op, — ро = \/207p (1 — rp, p,) (61) 


(SE of the difference between two correlated percents when 
Р is estimated from P, and Р.) 


In example (2), P, = 60% and Р, = 50%, so that P = 55% and 
Q = 45%. Substituting in (61) we have that 


Op, — Pg = ү ых Аб — .408) * 


= .0342 


The obtained difference of .10 (.60 — .50) divided by .034 gives a 
CR of 2.94. From Table D, last line, we find that this critical ratio 
exceeds 2.58, the .01 level. We abandon the null hypothesis, there- 
fore, and conclude that our groups differed significantly in their 
answers to the two questions, 

A simpler formula than (61) which avoids the calculation of the 
correlation coefficient may be used when P has been estimated from 
P, and P; under the null hypothesis. This formula 7 is 


* The phi-coefficient of 408 was found from formula (93), page 367. 
t MeNemar, Q., “Note on the Sampling Error of the Difference between 
Correlated Proportions or Percentages.” Psuchometrika, 1947, 12, 153-157. 
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p, = LE 


(SE of the difference between two correlated percentages) 


(62) 


In example (2) we read from the second diagram that c — 20% 
and b = 10%, N being 250. Substituting іп (62) we have 


which ehecks the result obtained from (61). 


2. The reliability of the difference between two r's 


A useful and mathematically exact method of determining the SE 
of the difference between two 778 requires that we first convert the 
r's into Fisher's z-funetion. The significance of the difference be- 
tween two z's is then determined. The formula for the SE of the 
difference between two 2’s 18 


Op, = Oz — z = кі + = 5 (63) 
(SE of the difference between two z coefficients) 
I. 
where б, = VN=3) 


The following example will illustrate the procedure. 

Example (3) The г between intelligence and achievement in the 
freshman class of College A is .40, for N = 400. And the r between 
intelligence and achievement in the freshman class of College B is 
50 for N= 600. Is the relationship between intelligence and 
achievement higher in College B than in College А? 


From Table C we read that r's of .40 and .50 correspond to z's of 
42 and .55, respectively. If we put №, = 400 and Nə = 600, we have 
on substituting in (63) 


n 
On — а = (400--3) ^ (600 3) 


= .065 


*'Тһе two correlated variables take away 2 degrees of freedom; and the 
transformation into z adds another restriction. Hence we subtract 3 Írom each 
N (see p. 193). 
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Dividing .13 (.55 — .42) by .065, we obtain a CR of 2.00. This СЕ 
exceeds slightly the value 1.96 and hence is significant at the .05 
level. Based on the evidence we have, the r = .50 in College B is 
reliably higher than the r = .40 in College A. 

Use of the 2 transformation for r is especially useful when 778 are 
very high, as the sampling distributions of such r’s are known to be 
skewed—often badly so. To illustrate, suppose that 7 between two 
achievement tests is 87 in Grade 6 (Уі = 50) and that the r between 
the same tests is .72 in Grade 7 (Na = 65). Is there a significant 
difference between these two r's? 

From Table C we find that r's of .87 and .72 yield z's of 1.33 and 
91, respectively; and substituting №; and Na in formula (63) we 
have 


1 1 
Oz, — ғ = 7716 


= .193 


Dividing .42 (1.33 — 91) by .193 we get a CR of 2.18, well above 
the .05 level of 1.96 but below the .01 level of 2.58. We may discard 
the null hypothesis, therefore, and mark the difference between our 
r's significant at the .05 level. 

Measurement of the significance of the difference between two 778 
obtained from the same sample presents certain complications, as 778 
from the same group are presumably correlated. Formulas for com- 
puting the correlation between two correlated r’s are not entirely 
satisfactory and there is no method of determining the correlation 
between two z's directly. Fortunately, we may feel sure that if the 
r's are positively correlated in our group, and the CR as determined 
by the SE from (63) is significant, that the CR would be even more 
significant if the correlation between the r's were known. 

Тһе z-transformation ean be usefully employed when r's which 
differ widely in size are to be averaged or combined (p. 198). 


IV. The Significance of Deviations from Normality 


Distributions which show deviations from the normal form are 
said to exhibit skewness or kurtosis or both. Skewed distributions 
are asymmetrie or off-center—shifted to the right or left (Figs. 23 
and 24, p. 98) ; while distributions showing kurtosis are more flat- 
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tened or peaked than the normal (Fig. 25, p. 100). In many studies 
the investigator wants to know whether his distributions are too 
atypieal or deviant to be treated as normal, or whether their de- 
partures from normality are relatively mild and non-significant. 
Exact tests of the significance of various degrees of skewness or 
kurtosis will be found in more advanced text books.* The approxi- 
mate tests of significance given in this section are accurate enough 
for many purposes and are relatively easy to apply. 


1. The reliability of the percentile measure of skewness 


On page 99, the following formula was given for estimating the 
skewness of a frequency distribution in terms of its median and 
certain percentiles: 


sh = (Рю T Pu) рь (20) 


According to this formula, the skewness of the 50 Army Alpha scores 
in Table 1, page 5, is —2.50. The problem, then, is to determine 
whether this degree of skewness represents a significant departure 
from zero, the skewness of the normal curve. The SE of the measure 
of skewness given above is 


5185 D 
Osk = VN 


[SE of the measure of skewness given in formula (20)] 


(64) 


in which D = (Po — Рі). 

In the frequency distribution of the 50 Army Alpha scores, 
Ру = 187, Pio = 152, and D = 35. From formula (64), therefore, 

.5185 X 35 
Osk = e. = 2.57 

The deviation of our measure of skewness from 0 skewness is —2.50, 
and dividing —2.50 by 2.57 (CR = т/озь) we get a CR of —.97. 
Note that the minus sign of 2.50 indicates simply the direction of 
skewness. Our Sk, therefore, deviates — 97 og; from 0, the measure 
of skewness in the normal curve. From Table D we find that —.97 
falls well within the +1.96 limits, which determine the .05 level of 


* Johnson, Palmer O., Statistical Methods in Research (New York: Prentice- 
Hall, Inc., 1949), Chap. 7. 


242 • STATISTICS IN PSYCHOLOGY AND EDUCATION 


significance. Hence it is clear that —2.50 represents no real devia- 
tion of this frequency distribution from normality. 

Тһе skewness of the distribution of 200 cancellation scores (p. 99) 
is .03 by formula (20). Since Poo = 128.5, Ро = 110.4, and D = 18.1, 
the SE of Sk is 


_ 5185 X 18.1 _ 
v200 iT 

Dividing .03 by .66, we get .046; and from Table D we find that this 

CR is far short of 1.96, the .05 level of significance. In fact, this 


distribution is almost perfectly symmetrical as is shown in Figure 5, 
page 18. 


Овк 


2. The reliability of the percentile measure of kurtosis 


Тһе formula below for measuring kurtosis in terms of Q and cer- 
tain percentiles in the distribution was given on page 100: 


Ки- (21) 


ба. 
(Poo — Pio) 
The kurtosis of the frequency distribution of 50 Army Alpha scores 
(р. 00) by formula (21) is .237; and this Ku deviates —.026 from 
.263, the Ku of the normal distribution (p. 100). The negative direc- 
tion of the deviation indicates that the distribution tends toward 
leptokurtosis. 

To estimate the significance of our Ku of —.026 from the Ku of 
the normal curve, we may calculate the SE of Ku by the following 
formula: 


28 
ока = у (65) 
[SE of the measure of Ku given by formula (21)] 


in which N is, of course, the size of the sample. 


For the 50 Army Alpha scores (p. 5), ск, = vm or .039, and 
the CR (Ku/ox,) is —.026/.039 or —.67. This CR is less than 1.96, 
the .05 significance level, and there is no evidence—so far as our test 
is concerned—that this distribution is really more peaked than the 
normal. 

The kurtosis of the 200 cancellation scores (p. 13) is .223 by 
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formula (21). This Ku deviates —.040 from .263, the Ku of the 
normal eurve. Again the direction of the deviation is toward lepto- 
kurtosis. The SE of our Ku of .223 is .020 by formula (65); and 
Ku/ox, is —.040/.020 or —2.00. Deviation from normal kurtosis is 
slightly greater than 1.96, the .05 significance level, but less than 
2.58, the .01 significance level. The narrow dispersion of this dis- 
tribution (Q = 4.04) and the fairly large N leads to a heavy con- 
centration of cases in the middle range; and these factors could well 
account for the strong tendency of this distribution to be more 
peaked than the normal. Leptokurtosis is not apparent in the curve 
itself (Fig. 5, p. 18). 


PROBLEMS 


1. The difference between two means is 3.60 and бр = 3. Both samples 
are larger than 100. 
(a) Is the obtained difference significant at the .05 level? 
(b) What percent is the obtained difference of the difference necessary 
for significance at the .01 level? 
(c) Find the limits of the .99 confidence-interval for the true difference. 


2. A personality inventory is administered in a private school to 8 boys 
whose conduct records are exemplary, and to 5 boys whose records are 
very poor. Data are given below. 


Group 1: 110 112 95 105 111 97 112 102 
“ % 115 112 109 112 117 


Is the difference between group means significant at the .05 level? 
at the .01 level? 


3. In which of the following experimental problems would it be more 
important to avoid Type I errors of inference than Type II errors in 
determining the significance of a difference? 

(a) Sex differences in reading rate and comprehension in the fifth 
grade. 

(b) Effects of a new drug upon reaction time—especially when the 
drugs are potent and probably dangerous. 

(c) Comparison of two methods of learning a new skill. 

(d) Acceptance of a program which involves much time and money 
and rejection of a less expensive program. 

(e) Comparative efficiency of a speed-up and a normal rate of work 
in a factory. 


4. In the first trial of a practice period, 25 twelve-year-olds have a mean 
score of 80.00 and a SD of 8.00 upon a digit-symbol learning test. On the 
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tenth trial, the mean is 84.00 and the SD is 10.00. The г between scores 

on the first and tenth trials is 40. Our hypothesis is that practice leads 

to gain. 

(a) Is the gain in score significant at the .05 level? at the .01 level? 
(p. 217) 

(b) What gain would be significant at the .01 level, other conditions 
remaining the same? 


5. Two groups of high-school pupils are matched for initial ability ina 
biology test. Group 1 is taught by the lecture method, and Group 2 
by the lecture-demonstration method. Data are as follows: 


Group 1 Group 2 
(control) (experimental) 


N 60 60 
Mean initial score on the biology test 42.30 42.50 
со! initial scores on the biology test. 5.36 5.38 
Mean final score on the biology test 54.54 56.74 
о of final scores оп the biology test 6.34 7.25 
r (between final scores on the biology test) — .50 


(a) Is the difference between the final scores made by Groups 1 and 2 
upon the biology test significant at the .05 level? at the .01 level? 

(b) Determine the limits of the .95 confidence-interval for the true 
difference. 

(c) Is the difference in the variability of the final scores made by 
Groups 1 and 2 significant at the .05 level? 


6. Two groups of high-school students are matched for M and o upon 
a group intelligence test. There are fifty-eight subjects in Group A and 
seventy-two in Group B. The records of these two groups upon a bat- 
tery of “learning” tests are as follows: 


Group A Group B 
M 48.52 53.61 
с 10.60 15.35 
N 58 72 


Тһе correlation of the group intelligence test and the learning battery 
in the entire group from which A and B were drawn is .50. Is the differ- 
ence between Groups A and B significant at the .05 level? at the .01 
level? 


7. Caleulate measures of skewness and kurtosis for the first two distribu- 
tions in Chapter 2, problem 1, page 40. Compute standard errors of 
Sk and Ku by the formulas given on pages 241 and 242. Determine 
whether either of these distributions departs significantly from the nor- 
mal form. 
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8. In a school of 500 pupils, 52.3% are girls; and in а second school of 
300 pupils, 47.7% are girls. Is there а significant difference between 
the percentages of girls enrolled in the two schools? 


9. Given the following data for an item in Stanford-Binet: of 100 nine- 
year-olds, 72% pass; of 100 ten-year-olds, 78% pass. Is the item more 
difficult for nine-year-olds than for ten-year-olds? 


10. (a) To the question *Would you like to be an aviator?" 145 fifteen- 
year-old boys in a high-school class of 205 answered “Yes” and 60 
answered “No.” To the question “Would you like to be an engi- 
neer?” 125 said “Yes” and 80 answered “No.” The data in the 
table below show the number who answered “Yes” to both ques- 
tions, “No” to both questions, “Yes” to one and “No” to the other. 
Is desire to be an aviator significantly stronger in this group than 
desire to be an engineer? 


Ques. 1 
No Yes 
Yes 25 100 | 125 
Ques. 2 
No 35 45 80 


60 145 205 
(b) In a group of 64 seventh-grade children, 32 answered Item 23 cor- 
rectly and 36 answered Item 26 correctly. From the table below, 
determine whether the difference in the percentage of correct 
answers is significant. 


Item 23 
- + 
+ 10 26 |36 
Пет 26 
= 22 6 28 


32 82 64 
11. In random samples of 100 cases each from four groups, A, B, C, and D, 
the following results were obtained: 
А B с D 

Mean 101.00 104.00 93.00 86.00 

6 1000 1100 9.60 850 
What are the chances that, in general, the mean of 
(a) the B's is higher than the mean of the A's. 
(b) the A's is higher than the mean of the C's. 
(c) the C's is higher than the mean of the D's. 


246 • STATISTICS IN PSYCHOLOGY AND EDUCATION 


12. 


ао ол n 


What are the chances that 

(a) any B will be better than the mean A. 
(b) any B will be better than the mean C. 
(c) any B will be better than the mean D. 


(a) Тһе correlation between height and weight in a sample of 200 ten- 
year-old boys is..70; and the correlation between height and weight 
in а sample of 250 ten-year-old girls is .62. Is this difference sig- 
nificant? 

(b) Ina sample of 150 high-school freshmen the correlation of two edu- 
cational achievement tests is .65. If from past years the correla- 
tion has averaged .60, is the present group atypical? (Does .65 
differ significantly from .60?) 


ANSWERS 


. (a) No. CR=1.20 (b) 46.5% (c) —4.14 and 11.34 

. #=23; for 11 а), significant at .05, not at .01 level 

. a,candd 

. (a) Significant at .05, not at .01 level. Since t = 2.00 there is approxi- 


mately 1 chance in 50 that a plus difference (gain) of 4 would 
occur under the null hypothesis. 
(b) 4.98 


‚ (a) t — 249; difference in M's significant at 05 but not at .01 level. 


(b) 48 to 3.97 
(с) No. t= 1.20 


6. Significant at .05 level (# = 2.57) and almost significant at .01 level. 
7. Distribution 55/04, Ku/ox, d 


1 —23 155 Deviation from normality not significant’ 
2 БІ езе 6 poe т d 27 Lo EM 
8. No. CR — 124 
9. No. СЕ = 98 
10. (a) Significant at .05, not at .01 level (CR — 2.03) 


TUM 


12. 


(b) Not significant (CR approximately 1.00) 


(a) 98 in 100 
(b) more than 99 in 100 
(c) more than 99 in 100 


(a) 61 in 100 
(b) 84 in 100 
(c) 95 in 100 


(a) No. CR — 147 (b) No. CR — 1.09 


10 


TESTING EXPERIMENTAL HYPOTHESES 


D 


+ 


Тһе hypothesis proposed in а psychological experiment may take 
the form of a general theory or a specific inquiry. A specific hypothe- 
sis is ordinarily to be preferred to a general proposal, as the more 
definite and exact the query the greater the likelihood of a conclu- 
sive answer. In the preceding chapter, the significance of an obtained 
difference was tested against a null hypothesis. In the present chap- 
ter, we shall consider further the nature of hypotheses and shall pre- 
sent certain useful procedures and methods for answering the ques- 
tions raised by an experiment. 


1. The Null Hypothesis 


1. Advantages of the null hypothesis 


In Chapter 9 the difference between two statistics was tested 
against a null hypothesis, namely, that the true difference is zero. 
The null hypothesis is not confined to zero differences nor to the dif- 
ferences between statistics. Others forms of this hypothesis assert 
that the results found in an experiment do not differ significantly 
from results to be expected on a probability basis or stipulated in 
terms of some theory. A null hypothesis, as we have said on page 213, 
is ordinarily more useful than other hypotheses because it is exact. 
Hypotheses other than the null can, to be sure, be stated exactly: we 
may, for example, assert that a group which has received special 
training will be 5 points on the average ahead of an untrained (con- 
trol) group. But it is difficult to set up such precise expectations in 
most experiments. And for this reason it is usually advisable to test 

247 
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against a null hypothesis, rather than some other, if this can be done. 

It is sometimes not fully understood that the rejection of a null 
hypothesis does not immediately force acceptance of a contrary 
view * (see p. 215). The extrasensory perception (ESP) experi- 
ments T offer a good illustration of what is meant by this statement. 
In a typical ESP experiment, a pack of 25 cards is used. There are 
5 different symbols on these cards, each symbol appearing on 5 cards. 
In guessing through the pack of 25, the probability of chance success 
with each card is 1/5. And the number of correct “calls” in a pack 
of 25 should be 5. If a subject calls the cards correctly much in 
excess of chance expectation (i.e., in excess of 5) the null (chance) 
hypothesis is rejected. But rejection of the chance hypothesis does 
not force acceptance of ESP as the cause of the extra-chance result. 
Before this claim can be made, one must demonstrate in follow-up 
experiments that extra-chance results are obtained when all likely 
causes, such as runs of cards, visual and other cues, poor shuffling 
and recording, and the like have been eliminated. If under rigid con- 
trols calls in excess of chance are consistently obtained, we may 
reject the null (chance) hypothesis and accept ESP. But the ac- 
ceptance of a positive hypothesis—it should be noted—is the end 
result of a series of careful experiments. And moreover, it is a logical 
and not primarily a statistical conclusion. 


2. Testing experimentally observed results against the direct determina- 
tion of probable outcomes 


The null hypothesis is often useful when we wish to compare 
observed results with those to be expected by “chance.” Several 
examples will illustrate the methods to be employed. 


Example (1) Two tones, differing slightly in pitch, are to be 
compared in an experiment. The tones are presented in succession, 
the subject being instructed to report the second as higher or lower 
than the first. Presentation is in random order. In ten trials a sub- 
ject is right in his judgment seven times. Is this result significant, 
i.e., better than chance? 


Since the subject is either right or wrong in his judgment, and since 
judgments are separate and independent, we may test our result 


* Morgan, J. J. B., “Credence Given to One Hypothesis Because of the Over- 
throw of Its Rivals,” Amer. Jour. Psychol., 1945, 58, 54-64, 

+ Rhine, J. B., et al., Етіта-зепзоту Perception after Sixty Years (New York: 
Henry Holt and Co., 1940). 
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against the binomial expansion (p. 90). Ten judgments may be 
taken as analogous to ten coins; a right judgment corresponds to a 
head, say, a wrong judgment to a tail. The odds are even that any 
given judgment will be right; hence in ten trials (since p = 1/2) our 
subject should in general be right five times by chance alone. The 
question, then, is whether seven “rights” are significantly greater 
than the expected five. From page 90 we find that upon expanding 
(p +q)” the probability of 10 right judgments is 1/1024; of 9 right 
and one wrong, 10/1024; of 8 right and 2 wrong, 45/1024; and of 7 
right and 3 wrong, 120/1024. Adding these fractions we get 176/1024, 
or .172 as the probability of 7 or more right judgments by chance 
alone, The probability of just 7 rights is 120/1024, or approximately 
19. Neither of these results is significant at the .05 level of confi- 
dence (p. 186) and accordingly the null hypothesis must be retained. 
On the evidence there is no reason to believe that our subject’s 
judgments are really better than chance expectation. 

Note that to get 10 right is highly significant (the probability is 
approximately .001) ; to get 9 or 10 right is also significant (the prob- 
ability is 1/1024 + 10/1024, or approximately .01). To get 8 or 
more right is almost significant at the .05 level (the probability is 
.055) ; but any number right less than 8 fails to reach our standard. 
The situation described in Example (1) occurs in a number of ex- 
periments—whenever, for example, objects, weights, lights, test 
items, or other stimuli are to be compared, the odds being 50:50 that 
a given judgment is correct. 

Example (2) Ten photos, 5 of feeble-minded and 5 of normal 
children (of the same age and sex), are presented to a subject who 
claims he can identify the feeble-minded from their photographs. 
The subject is instructed to designate which five photographs are 
those of feeble-minded children. How many photos must our sub- 
ject identify correctly before the null hypothesis is disproved? 

Since there are 5 feeble-minded and 5 normal photos, the subject 
has a 50:50 chance of success with each photo and the method of 
Example (1) could be used. A better test,” however, is to determine 
the probability that a particular set of 5 photos (namely, the right 
five) will be selected from all possible sets of 5 which may be drawn 
from the 10 given photos. To find how many combinations of 5 
photos can be drawn from a set of 10, we may use conveniently the 
formula for the combination of 10 things taken 5 at a time. This 


* Fisher, В. A., The Design of Experiments (London: Oliver and Boyd, 
1935), Chapter 2, pp. 26-29 especially. 


250 • STATISTICS IN PSYCHOLOGY AND EDUCATION 


! 
formula * is written (1% = c — 252. The symbol (19; is read 


“the combinations of ten things taken five at а time"; 10! (read 
*10 factorial") is 10:9:8:7:6:5:4:3:2:1; and 5 ! is 5:4:3:2:1. 

It is possible, therefore, to draw 252 combinations of 5 from a веб 
of 10, and accordingly there is one chance in 252 that a judge will 
select the 5 correct photos out of all possible sets of 5. If he does 
select the right 5, this result is obviously significant (the probability 
is approximately .004) and the null hypothesis must be rejected. 
Suppose that our judge's set of 5 photos contains 4 feeble-minded 
and one normal picture; or 3 feeble-minded and 2 normal pic- 
tures. Is either of these results significant? The probability of 4 right 

5 5 
selections and one wrong selection by chance is си i.e., the 
product of the number of ways 4 rights can be selected from the 
5 feeble-minded pictures times the number of ways one wrong can be 
selected from the 5 normal pictures divided by the total number of 
combinations of 5. Caleulation shows this result to be 25/252 or 1/ 10 
(approximately) and hence not significant at the .05 level. The prob- 

5 5 
ability of getting 3 right and 2 wrong is given by COCOS namely, 
the product of the number of ways 3 pictures can be selected from 5 
(the 5 feeble-minded pictures) times the number of ways 2 pictures 
can be selected from the 5 normal pictures divided by the total num- 
ber of combinations of 5. This result is 100/252 or slightly greater 
than 1/3, and is clearly not significant. 

Our subject disproves the null hypothesis, then, only when all 5 
feeble-minded pictures are correctly chosen. The probabilities of 
various combinations of right and wrong choices are given below— 
they should be verified by the student: 

Probability of all 5R = — 1/252 
” ы АВ = 25/252 
ы а ЗЕ = 100/252 
n к 2R == 100/252 
к 4 1R- 25/252 
Y: s OR= 1/252 


It may be noted that by increasing the number of pictures of 

feeble-minded and normal from 10 to 20, say, the sensitiveness of 

*The кепе formula for the cembinations of n things taken т at a time 
n 


is С = оа у) 


| 
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the experiment can be considerably enhanced. With 20 pictures it is 
not necessary to get all 10 feeble-minded photos right in order to 
achieve a significant result. In fact, 8 right is nearly significant at 
the .01 level as shown below. 


C»,) = we 184,756 
ie OO) ou 
Combinations Frequency Prob. ratio (freq. 184,756) 
10R OW 1 000005 
98 1W 100 10005 
SR 2W 2025 011 
7R ЗУ 14400 078 
6R 4W 44100 238 
5R 5W 63504 343 
4R 6W 44100 238 
SR ТҮ 14400 078 
28 8W 2025 011 
ІК 9W 100 10005 
OR 10W 1 1000005 
184,756 


3. Testing experimentally observed results against probabilities calcu- 
lated from the normal curve 


When the number of observations or the number of trials is large, 
direct calculation of expectations by expanding the binomial 
(p +q)” becomes highly laborious. Since (p +q)” yields a distribu- 
tion (p. 91) which is essentially normal when n is large, in many 
experiments the normal curve may be usefully employed to provide 
expected results under the null hypothesis. Ап example will make 
the method clear. 


Example (3) In answering a test of 100 true-false items, a sub- 
ject gets 60 right. Is it likely that the subject merely guessed? 


As there are only two possible answers to each item, one of which 
is right and the other wrong, the probability of a correct, answer to 
any item is 1/2, and our subject should by chance answer 1/2 of 100 
or 50 items correctly. Letting p equal the probability of a right 
answer, and q the probability of a wrong answer, we could, by ex- 
panding the binomial (р 4- q)'^, caleulate the probability of various 
combinations of rights and wrongs on the null hypothesis. When the 
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exponent of the binomial (here, number of items) is as large as 100, 
however, the resulting distribution is very close to the normal prob- 
ability eurve (p. 87) and may be so treated with little error. 


Figure 49 illustrates the solution of this problem. Тһе mean 
of the eurve is set at 50. The SD of the probability distribution 
found by expanding (р--4)" is o = утра; hence for (p+ q), 
в = \/100 X 1/2 X 1/2 or 5. A score of 60 covers the interval on the 
baseline from 59.5 up to 60.5. The lower limit of 60 is 1.96 removed 


59.5 — 50 


from the mean = 1.90 |; and from Table A we find that 


2.87% of the area of a normal curve lies above 1.90.* There are only 
three chances in 100 that a score of 60 (or more) would be made if 
the null hypothesis were true. A score of 60, therefore, is significant 
at the .05 level. We may rejeet the null hypothesis with some con- 
fidence and conclude that our subject could not have been simply 
guessing. 

Note that the problem above could have been solved equally well 
in terms of percentages. We should expect our subject to get 5046 
of the items right by guessing. The SD of this percentage is 


TEEL or 8%. A score of 60% (lower limit 59.5%) is 9.5% 


* Note that only one end of the normal curve is used. See page 217. 
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1.96 distant from the middle of the curve. We interpret this result in 
exactly the same way as that above. 


Ezample (4) A multiple-choice test of 60 items provides four 
possible responses to each item. How many items should a subject 
answer correctly before we may feel sure that he knows something 
about the test material? 


Since there are four responses to each item, only one of which is 
correct, the probability of a right answer by guessing is 1/4, of a 
wrong answer 3/4. The final score to be expected if a subject knows 
nothing whatever about the test and simply guesses is 1/4 Х 60 ог 15. 
Our task, therefore, is to determine how much better than 15 a sub- 
ject must score in order to demonstrate real knowledge of the 
material. 

This problem can be solved by the methods of Example (1). By 
expanding the binomial (р -- 9)", for instance, in which p = 1/4, 
q = 3/4, and n = 60, we can determine the probability of the occur- 
rence of any score from 0 to 60. The direct determination of prob- 
abilities from the binomial expansion is straightforward and exact 
but the calculation is tedious. Fortunately, therefore, a satisfactory 
approximation to the answer we want can be obtained by using the 
normal distribution to determine probabilities, as in Example (3). 
The mean of our “chance” distribution is 1/4 of 60 or 15; and the 
с = \/npq = V60 X 1/4 X 3/4 or 3.35. From Table A we know that 
5% of the frequency in a normal distribution lie above 1.656. Mul- 
tiplying our obtained о (3.35) by 1.65, we get 5.53; and this value 
when added to 15 gives us 20.5 as the point above which lie 576 of 
the "chance" distribution of scores. A score of 21 (20.5 to 21.5), 
therefore, may be regarded as significant, and if a subject achieves 
such a score we ean be reasonably sure that he is not merely 
guessing. 

For a higher level of assurance, we may take that score which 
would occur by chance only once іп 100 trials. From Table A, 1% 
of the frequency in the normal curve lies above 2.330. This point is 
7.81 (3.35 X 2.33) above 15 or at 22.8. A score of 23, therefore, or а 
higher score is very significant; only once in 100 trials would a sub- 
ject achieve such a score by guessing. 

Use of the normal probability curve in the solution of problems like 
this always involves a degree of approximation. When р differs con- 
siderably from 1/2 and n is small, the distribution resulting from the 
expansion of (p + q)” is skewed and is not therefore accurately de- 
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scribed by the normal curve. Under these circumstances one must 
resort to the direct determination of probabilities as in Example (1). 
When т is large, however, and p not far from 1/2, the normal dis- 
tribution can be safely used, as will be shown by the chi-square tests 
on page 261. 


11. The x? (Chi-square) Test and the Null Hypothesis 


The chi-square test represents a useful method of comparing 
experimentally obtained results with those to be expected theoreti- 
cally on some hypothesis." The formula for chi-square (y?) is stated 


as follows: 
(fo — fe) TUE] 
r= әуен 7, (66) 


(chi-square formula for testing agreement between 
observed and expected results) 


in which 
fo = frequency of occurrence of observed or experimentally deter- 
mined facts; 
f, = expected frequency of occurrence on some hypothesis. 


The differences between observed and expected frequencies are 
squared and divided by the expected number in each case, and the 
sum of these quotients is у. The more closely the observed results 
approximate to the expected, the smaller the chi-square and the 
closer the agreement between observed data and the hypothesis being 
tested. Contrariwise, the larger the chi-square the greater the prob- 
ability of a real divergence of experimentally observed from expected 
results. To evaluate chi-square, we enter Table E with the computed 
value of chi-square and the appropriate number of degrees of free- 
dom. The number of df = (r — 1) (c — 1) in which т is the number 
of rows and c the number of columns in which the data are tabulated. 
From Table E we read P, the probability that the obtained y? is sig- 
nificant. Several illustrations of the chi-square test will be given in 
. the sections following. 


* Lewis, D., Quantitative Methods in Psychology (Ann Arbor: Edwards 
Bros., Tod. Chap. 8. 
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1. Testing the divergence of observed results from those expected on the 
hypothesis of equal probability (null hypothesis) 


Example (1) Forty-eight subjects are asked to express their 
attitude toward the proposition “Should the United States Join a 
Security Organization of Nations?" by marking F (favorable) 
1 (indifferent) or U (unfavorable). Of the members in the group, 
24 marked F, 12 J, and 12 U. Do these results indicate a significant 
trend of opinion? 


The observed data (fe) are given in the first row of Table 26. In 
the second row is the distribution of answers to be expected on the 
null hypothesis (fe), if each answer is selected equally often. Below 
the table are entered the differences (f, — fe). Each of these differ- 
ences is squared and divided by its fe (64/16 + 16/16 + 16/16) to 


give x? = 6. 


TABLE 26 
Answers 
Favorable Indifferent Unfavorable 

Observed (f.) | 24 12 2 |48 
Expected (f.) 16 16 16 48 

= 8 4 4 

tie = Wa 64 16 16 

(f, — fe)? 4 1 1 


x= 2 е-е dí-2 P = 05 (Table E) 


The degrees of freedom in the table may be calculated from the 
formula df = (т — 1) (c — 1) to be (3 — 1) (2— 1) or 2. Or, the de- 
grees of freedom may be found directly in the following way: Since 
we know the row totals to be 48, when two entries are made in a row 
the third is immediately fixed, is not "free." When the first two 
entries in row 1 are 24 and 12, for example, the third entry must be 
12 to make up 48. Since we also know the sums of the columns, 
only one entry in a column is free, the second being fixed as soon as 
the first is tabulated. There are, then, two degrees of freedom for 
rows and one degree of freedom for columns, and 2 X 1 = 2 degrees 
of freedom for the table. 
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Entering Table E we find in row df = 2, a у? of almost 6 (actually, 
5.991) in the column headed .05. A P of .05 means that should we 
repeat this experiment, only once in 20 trials would а x? of 6 (or 
more) occur if the null hypothesis were true. Our result may be 
marked “significant at the .05 level," therefore, on the grounds that 
divergence of observed from expected results is too unlikely of occur- 
rence to be accounted for solely by sampling fluctuations. We reject 
the “equal answer" hypothesis and conclude that our group really 
favors the proposition. In general we may safely discard a null 
hypothesis whenever P is .05 or less. 


Example (2) Тһе items in an attitude scale are answered by 
underlining one of the following phrases: Strongly approve, ap- 
prove, indifferent, disapprove, strongly disapprove. The distribu- 
tion of answers to an item marked by 100 subjeets is shown in 
Table 27. Do these answers diverge significantly from the distribu- 
tion to be expected if there are no preferences in the group? 


TABLE 27 
Stro Indifer- Di Strongly 
Ta Approve VR T COM 
Observed (fe) 
eos [2] 
= fa) 
De f 2 
(,—f9 .5 20 Ps i d 


x'22.0 4-4 P lies between -70 and .80 


On the null hypothesis of *equal probability" 20 subjects may be 
expected to select each of the 5 possible answers. Squaring the 
(fo — fe), dividing by the expected result (f,), and summing, we 
obtain a x? of 2.10. df = (5 — 1) (2 — 1) or 4. From Table E, read- 
ing across from row df = 4, we locate а у? of 2.195 in column .70. 
This x? is nearest to our calculated value of 2.10, which lies between 
the entries in columns .70 and .80. It is sufficiently accurate to de- 
scribe P as lying between .70 and .80 without interpolation. Since 
this much divergence from the null hypothesis, namely, 2.10, can be 
expected to occur upon repetition of the experiment in approximately 
75% of the trials, у? is clearly not significant and we must retain the 
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null hypothesis. There is no conclusive evidence of either a strongly 
favorable or unfavorable attitude toward this item. 


2. Testing the divergence of observed results from those expected on the 
hypothesis of a normal distribution 


Our hypothesis may assert that the frequencies of an event which 
we have observed really follow the normal distribution instead of 
being equally probable. An example illustrates how this hypothesis 
may be tested by chi-square. 


Example (3) Forty-two salesmen have been classified into 3 
groups—very good, satisfactory, and poor—by a consensus of sales 
managers. Does this distribution of ratings differ significantly 
from that to be expected if selling ability is normally distributed? 


TABLE 28 
Good Satisfactory Poor 
Observed (fo) 16 20 6 42 
Expected (fe) 6.7 286 67 42 
(fo — fe) 93 86 7 
(f. — fe)? 86.49 73.96 49 
Ge=fa* 12.90 2.59 07 
fe 


4321556 df=2 Р is less than 01 


The entries in row 1 give the number of men classified in each of 
the З categories. In row 2, the entries show how many of the 42 sales- 
men may be expected to fall in each category on the hypothesis of a 
normal distribution. These last entries were found by dividing the 
baseline of a normal curve (taken to extend over 66) into 3 equal 
segments of 26 each. From Table A, the proportion of the normal 
distribution to be found in each of these segments is as follows: 


Proportion 
Between --3.00с and -1.000 16 
* 411006 and -1.006 68 
* —1.000 and —3.00c 16 


1.00 


258 > STATISTICS IN PSYCHOLOGY AND EDUCATION 


These proportions of 42 have been calculated and are entered іп 
Table 28. The x? in the table is 15.56 and df = (3 — 1) (2— 1) or 
2. From Table E it is clear that this y? lies beyond the limits of the 
table, hence P is listed simply as less than .01. Тһе discrepancy 
between observed and expected values is so great that the hypothesis 
of a normal distribution of selling ability must be rejected. Too 
many men have been described as good, and too few as satisfactory, 
to make for agreement, with our hypothesis. 


3. The chi-square test when table entries are small 


When the entries in a table are fairly large, у? gives an estimate of 
divergence from hypothesis which is close to that obtained by other 
measures of probability. But у? is not stable when computed from a 
table in which any theoretical frequency is less than 5. Moreover, 
when the table is 2 X 2 fold (when df = 1), y? is subject to considera- 
ble error unless a correction for continuity (called Yates’ correction) 
is made. Reasons for making this correction and its effect upon x? 
can best be seen by working through the examples following. 


Example (4) In Example (1), page 248, an observer gave seven 
correct judgments in ten trials. The probability of a right judg- 
ment was 1/2 in each instance, so that the expected number of 
correct judgments was five. Test our subject’s deviation from the 
null hypothesis by computing chi-square and compare the P with 
that found by direct calculation. 


TABLE 29 
Right Winks 
Observed (f.) | 7 з | 10 
Expected (f) | 5 5 10 


(fo — f) 2 2 
Correction (- .5) 1.5 1.5 

(fo — f» 225 225 

(Жыры 45 45 


fe 


3 = 90 
=1 
И = .356 (by interpolation in Table E) 
АР = 1178 
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Caleulations in Table 29 follow those of previous tables except for 
the correction which consists in subtracting .5 from each (f,— fe) 
difference. In applying the y?-test we assume that adjacent fre- 
quencies are connected by a continuous and smooth curve (like the 
normal curve) and are not diserete numbers. In 2X 2 fold tables, 
especially when entries are small, the у? curve is not continuous. 
Hence, the deviation of 7 from 5 must be written as 1.5 (6.5 — 5) 
instead of 2 (7- 5), as 6.5 is the lower limit of 7 in a continuous 
series. In like manner the deviation of 3 from 5 must be taken from 
the upper limit of 3, namely, 3.5 (see Fig. 49). Still another change 
in procedure must be made in order to have the probability obtained 
from y? agree with the direct determination of probability. Р in the 
x? table gives the probability of 7 or more right answers and of 3 or 
less right answers, i.e., takes account of both ends of the probability 
curve (see p. 217). We must take 1/2 of P, therefore, if we want only 
the probability of 7 or more right answers. Note that the P/2 of .178 
is very close to the P of .172 got by the direct method on page 249. 
If we repeated our test we should expect a score of 7 or better about 
17 times in 100 trials. It is clear, therefore, that the obtained score 
is not significant and does not refute the null hypothesis. 

It should be noted that had we omitted the correction for continu- 
ity, chi-square would have been 1.60 and P/2 (by interpolation in 
Table E),.095. Failure to use the correction causes the probability 
of a given result to be greatly underestimated and the chances of its 
being called significant considerably increased. 

When the expected entries in a 2 X 2 fold table are the same (as 
in Tables 29, 30) the formula for chi-square may be written ina 
somewhat shorter form as follows: 


= 2 
p= Meet Om 
Я 
(short formula for y? in 2 < 2 fold tables when expected 
frequencies are equal) 


Applying formula (67) to Table 29 we get a chi-square of 
2(1.5)2 
2(1.5)? _ go, 
5 
Example (5) In Example (3), page 251, a subject achieved a 
score of 60 right on a test of 100 true-false items. From the chi- 
square test, determine whether this subject was merely guessing. 
Compare your result with that found on page 252 when the normal 
curve hypothesis was employed. 
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TABLE 30 


Right Wrong 


Observed (fe) | 60 40 |100 
Expected (f.) | 50 50 |100 


(fo — fe) 10 10 
Correction (— .5) 9.5 9.5 


(fo — fe)? 90.25 90.25 

(fo — J) 1.81 1.81 

2- E Y P = 059 
Ды 1 ҰР = .0295 or .03 


Although the cell entries in Table 30 are large, use of the correc- 
tion for continuity will be found to yield a result in somewhat closer 
agreement with that found on page 252 than can be obtained without 
the correction. Ав shown in Figure 49, page 252, the probability of а 
deviation of 60 or more from 50 is that part of the curve lying above 
59.5. In Table E, the P of .059 gives us the probability of scores of 
60 ог more and of 40 or less. Hence we must take 1/2 of P (i.e. 
10295) to give us the probability of a score of 60 or more. Agreement 
between the probability given by the у?-{ез& and by direct calcula- 
tion is very close. Note that when 7? is calculated without the correc- 
tion, we get a P/2 of .024, a slight underestimation. In general, the 
correction for continuity has little effect when table entries are large, 
50 or more, say. But failure to use the correction even when numbers 
are large may lead to some underestimation of the probability; hence 
it is generally wise to use it. 


Example (6) Іп Example (4), page 253, given a multiple-choice 
test of 60 items (four possible answers to each item) we were re- 
quired to find what score a subject must achieve in order to dem- 
onstrate knowledge of the test material. By use of the normal prob- 
ability distribution, it was shown that a score of 21 is reasonably 
significant and a score of 23 highly significant. Can these results 
be verified by the chi-square test? 


In Table 31 an obtained score of 21 is tested against an expected 
score of 15. In the first line of the table the observed values (fẹ) are 
21 right and 39 wrong; in the second line, the expected or “guess” 
values are 15 right and 45 wrong. Making the correction for con- 
tinuity we obtain a x? of 2.69, a P of .10 and 1/2 P of .05. Only once 
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TABLE 31 


(fo — f)? 3025 30.25 
G= 9% 447 
fe 


2 = 269 P = 10 
ЖЕП 4P = 05 


in 20 trials would we expect a score of 21 or higher to occur if the 
subject were merely guessing and had no knowledge of the test 
material. This result checks that obtained on page 253. 

In Table 32 a score of 23 is tested against the expected score of 
15. Making the correction for continuity, we obtain a y? of 5.00 
which yields a P of .0275 and 1/2 P of .0138. Again this result closely 
checks the answer obtained on page 253 by use of the normal prob- 
ability curve. 


TABLE 32 


һ| 23 | 37 | 60 
f.| 15 | 45 | 60 


(fo — fe) 8 8 
Correction (— .5) T5 7.5 

(fo — fo? 56.25 56.25 

Су 8.76 1255 


fe 


„= 5.00 Р = 0275 
% -1 ҰР = .0138 or .01 


4. The chi-square test when table entries are in percentages 


The chi-square test should not be used with percentage entries 
unless a correction for size of sample is made. This follows from 
the fact that in dealing with probability the significance of an event 
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depends upon its actual frequency and is not shown by its percentage 
occurrence. For a penny to fall heads eight times in ten tosses is not 
as significant as for the penny to fall heads eighty times in 100 
tosses, although the percentage occurrence is the same in both cases, 
If we write the entries in Table 29 as percentages, we have 


R w 
| 70% | 30% | 100% 
f| 50% | 50% | 100% 


(0-7) 2% 9% 
Correction* (— 5%) 15% 15% 
HJ} 225% 225% 
2025) _ 9 һу (67) 
a 


х = 
2 =9 x20 = 9o (Table 29) 
x 100 ' 


It is clear that in order to bring y? to its proper value in terms of 
original numbers we must multiply the “percent” y? by 10/100 to 
give 90. A у? calculated from percentages must always be multiplied 
by N/100 (N — number of observations) in order to adjust it to the 
actual frequencies in the given sample. 


5. The chi-square test of independence in contingency tables 


We have seen that y? may be employed to test the agreement 
between observed results and those expected on some hypothesis. А 
further useful application of у can be made when we wish to in- 
vestigate the relationship between traits or attributes which can 
be classified into two or more categories. The same persons, for ex- 
ample, may be classified as to hair color (light, brown, black, red) 
and as to eye color (blue, gray, brown), and the correspondence in 
these attributes noted. Or fathers and sons may be classified with 
respect to interests or temperament or achievement and the relation- 
ship of the attributes in the two groups studied. 

Table 33 is a contingency table, i.e., a double entry or two-way 
table in which the possession by a group of varying degrees of two 
charaeteristies is represented. In the tabulation in Table 33, 413 
persons have been classified as to “eyedness” and “handedness.” 


*From Table 29 it is clear that the correction of —.5 becomes —5/N or 
—.05; this is —5% when entries in the table are expressed as percents. 
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Eyedness, or eye dominance, is described as left-eyed, ambiocular, or 
right-eyed; handedness as left-handed, ambidextrous, or right- 
handed. Reading down the first column we find that of 118 left-eyed 
persons, 34 are left-handed, 27 ambidextrous and 57 right-handed. 
Across the first row we find 124 left-handed persons, of whom 34 are 
left-eyed, 62 ambiocular and 28 right-eyed. The other columns and 
rows are interpreted in the same way. 


TABLE 33 Comparison of eyedness and handedness in 413 persons * 


Left-Eyed Ambiocular Right-Eyed Totals 


Left-handed 

Ambidextrous 

Right-handed 

Totals 

I. Calculation of independence values (f): 
ив хз = 354 1% х1 - 585 100 x174 — 30.0 
шү 195 X75 эв4 XT узд 
sx a a 105 x 314 _ 1o10 002204. gig 


П. Calculation of x: 

(— 1.4)? + 35.4 = .055 (3.5) + 585 = .209 (— 2.0)? + 30 = 1133 

(5.6) + 214 =1.465 (— 7.4)? + 35.4 = 1.547 (1.8)? + 18.2 178 

(— 4.1) + 611 = .275 (4.0)? + 1010 = .158 (.20) + 51.8 .001 
24-402 4-4 Р lies between .30 and .50 


The hypothesis to be tested is the null hypothesis, namely, that 
handedness and eyedness are essentially unrelated or independent. 
In order to compute y? we must first caleulate an "independence 
value" for each cell in the contingency table. Independence values 
are represented by figures in parentheses within the different cells; 
they give the number of people whom we should expect to find pos- 
sessing the designated eyedness and handedness combinations in the 
absence of any real association. The method of caleulating inde- 
репдепсе values is shown in Table 33. To illustrate with the first 
entry, there are 118 left-eyed and 124 left-handed persons. If there 

* From Woo, T. L., Biometrika, 1936, 20A, pp. 79-118. 
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were no association between left-eyedness and left-handedness we 


118 X 124 


should expect to find, by chance, or 35.4 individuals in our 


group who are left-eyed and left-handed. The reason for this may 
readily be seen. We know that 118/413 of the entire group is left- 
eyed. This proportion of left-eyed individuals should hold for any 
sub-group, if there is no dependence of eyedness on handedness. 
Hence, 118/413 or 28.596 of the 124 left-handed individuals, i.e., 
35.4, should also be left-eyed. Independence values for all cells are 
shown in Table 33. 

When the expected or independence values have been computed, 
we find the difference between the observed and expected values for 
each cell, square each difference and divide in each instance by the 
independence value. Тһе sum of these quotients by formula (66) 
gives x”. In the present problem y? = 4.02 and df = (3 — 1) (3 — 1) 
or 4. From Table E we find that P lies between .30 and .50 and hence 
X? is not significant. The observed results are close to those to be 
expected on the hypothesis of independence and there is no evidence 
of any real association between eyedness and handedness within our 
group. 

When the contingency table is 2X 2 fold, y? may be calculated 
without first computing the four expected frequencies—the four 
independence values. Example (7) illustrates the method. 


Ezample (7) АШ of the sixth-grade children in a public-school 
System are given a standard achievement test in arithmetic. A 
sample of 40 boys, drawn at random from the sixth-grade popula- 
tion, showed 23 at or above the national norm in the test and 17 
below the national norm. A random sample of 50 sixth-grade girls 
showed 22 at or above the national norm and 28 below. Are the 
boys really better than the girls in arithemetic? Data are arranged 
in a fourfold table as follows. 


below аё or above 
norm norm 


Boys 


(А--В) 
40 


Girls 02 m (С 2 D) 


(A+C) (B+D) N 
45 45 90 
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In a fourfold table, chi-square is given by the following formula.* 
jas N (AD — BC)* 
Т (A+B) (C+D) (4 4- C) (B 4- D) 
(Chi-square in a fourfold contingency table) 


y ig 


Substituting for A, B, C, D, in the formula, we have 
2 — _90 (374 — 644)? 

40 X 50 X 45 X 45 

and for df = 1, P is almost .20. x? is not significant and there is 


evidence that the table entries really vary from expectation, i.e., that 
there is a true sex difference in arithmetic. 


- 1.62 


5. The additive property of у? 


When several y?’s have been computed from independent experi- 
ments (i.e., from tables based upon different samples), these may be 
summed to give a new chi-square with df — the sum of the separate 
df's. 'The fact that chi-squares may be added to provide an over-all 
test of a hypothesis is important in many experimental studies. In 
Example (7) above we have seen that the boys did slightly better 
than the girls on the arithmetic achievement test, but the chi-square 
of 1.62 is not large enough to indicate a superiority of boys over 
girls. Suppose that three repetitions of this experiment are carried 
out, in each instance groups of boys and girls [of about the same size 
as in Example (7)] being drawn independently and at random from 
the sixth grade and listed as scoring “at or above” or “below” the 
national norm. Suppose further that the three chi-squares from 
these tables are 2.71, 5.39 and .15, in each case the boys being some- 
what better than the girls. We can now combine these four results 
to get an over-all test of the significance of this sex difference in 
arithmetic. Adding the three x?'s to the 1.62 in Example (7) we have 
а total у? of 9.87 with 4 df's. From Table E this x? is significant at 
the .05 level, and we may be reasonably sure that sixth-grade boys 
are, on the average, better than sixth-grade girls in arithmetic. It 
will be noted that our four experiments taken in aggregate yield а 
significant result, although only one of the ys (5.39) is itself 
significant. Combining the data from several experiments will often 


* See page 367 for relation of x? to phi-coefficient. 


266 + STATISTICS IN PSYCHOLOGY AND EDUCATION 


yield a definitive result when the separate experiments taken alone 
provide only indications or suggestions of a true difference. 


PROBLEMS 


1, Two sharp clicking sounds are presented in succession, the second being 
always more intense or less intense than the first. Presentation is in 
random order. In eight trials an observer is right six times. Is this 
result significant? 

(a) Calculate P directly (p. 249). 
(b) Check P found in (a) by yi-test (p. 258). Compare P's found 
with and without correction for continuity. 


2. А multiple-choice test of fifty items provides five responses to each 
item. How many items must a subject answer correctly 
(a) to reach the .05 confidence level? 
(b) to reach the .01 confidence level? 


3. A multiple-choice test of thirty items provides three responses for each 
item. How many items must a subject answer correctly before the 
chances are only one in fifty that he is merely guessing? 


4. A pack of fifty-two playing cards contains four suits (diamonds, clubs, 
spades, and hearts). A subject “guesses” through the pack of cards, 
naming only suits, and is right eighteen times. 

(a) Is this result better than “chance”? (Hint: In using the probabil- 
ity curve compute area to 17.5, lower limit of 18.0, rather than to 
18.0.) 

(b) Check your answer by the 3?-test (p. 257). 


_ 5. Twelve samples of handwriting, six from normal and six from insane 
adults, are presented to a graphologist who claims he can identify the 


writing of the insane. How many “insane” specimens must he recognize 
correctly in order to prove his contention? 


6. The following judgments were classified into six categories taken to 
represent, a continuum of opinion: 


Categories 
I II TTL Ту у VI Total 
Judgments: 480156153) 782). 101, 157 . 45 384 


(a) Test given distribution versus “equal probability” hypothesis. 
(b) Test given distribution versus normal distribution hypothesis. 


7. In 120 throws of a single die, the following distribution of faces was 
obtained: 
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Faces 
1 2 3 4 5 6 Total 
Observed 
frequencies: 30 25 18 10 22 15 120 


Do these results constitute a refutation of the “equal probability” 
(null) hypothesis? 


. The following table represents the number of boys and the number of 


girls who chose each of the five possible answers to an item in an atti- 
tude scale. 


Approve : А Strongly 
А Та, р 3 
Strongly pprove Indifferent Disapprove D "Total 
Boys 25 30 10 25 10 100 
Girls 10 15 5 15 15 60 


Do these data indieate a significant sex difference in attitude toward 
this question? [Note: Test the “independence (null) hypothesis."] 


. Тһе table below shows the number of normals and abnormals who 


chose each of the three possible answers to an item on a neurotie ques- 
tionnaire. 


Yes No ? Total 
Normals 14 66 10 90 
Abnormals 27 66 7 100 
41 132 17 190 


Does this item differentiate between the two groups? Test the inde- 
pendence hypothesis. 


. From the table below, determine whether Item 27 differentiates be- 


tween two groups of high and low general ability. 


Numbers of Two Groups Differing in General 
Ability Who Pass Item 27 in a Test 


Passed Failed Total 


High Ability 31 19 50 
Low Ability 24 26 _50 
55 45 100 


. Five у?'в computed from fourfold tables in independent replications of 


an experiment are .50, 4.10, 1.20, 2.79 and 5.41. Does the aggregate of 
these tests yield a significant y?? 
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кы 52 


ANSWERS 
(a) P — .145; not significant 
(b) P = .145 when corrected; .085 uncorrected 
(a) 15 
(b) 17 
15 
Probability of 18 or better is 08; not significant 
5 or 6 (Probability of 5 or 6 = 37/924 = 04) 


(a) y? = 27; P less than .01 and hypothesis of "equal probability" 
must be discarded. 

(b) 4? = 356; P is less than .01, and the deviation from the normal 
hypothesis is significant. 


Yes. у? = 12.90, df = 5, and Р is between 02 and .05. 


8. No. 4? 7.03, df = 4, and Р is between .20 and .10 


10. 
11. 


No. 3? = 4.14, df = 2, and Р is between 20 and 10 
No. у? = 1.98, df = 1, and P lies between .20 and 10 
Yes. 4? = 14.00, df = 5, and Р lies between 02 and .01. 


ANALYSIS OF VARIANCE ІМ DETERMINING 


THE SIGNIFICANCE OF DIFFERENCES 
BETWEEN MEANS 


* 


The methods described under analysis of variance include (1) а 
variety of procedures called experimental designs, as well as (2) cer- 
tain statistical techniques devised for use with these procedures. The 
statistics used in analysis of variance are not new (as they are some- 
times thought to be) but are, in fact, adaptations of formulas and 
methods described earlier in this book. The experimental designs, on 
the other hand, are in several instances new at least to psychology. 
These systematic approaches often provide more efficient and exact 
tests of experimental hypotheses than do the conventional methods 
ordinarily employed. 

This chapter will be concerned with the application of analysis of 
variance to the important and often-encountered problem of deter- 
mining the significance of the difference between means. This topic 
has been treated by classical methods in Chapter 9, and the present 
chapter will give the student an opportunity to contrast the relative 
efficiency of the two approaches and to gain, as well, some notion of 
the advantages and disadvantages of each. Treatment of other and 
more complex experimental designs through analysis of variance is 
beyond the scope of this book. After this introductory chapter, how- 
ever, the interested student should be able to follow the more com- 
prehensive treatments of analysis of variance in the references listed 
below.* 

ж Edwards, A. L., Experimental Design in Psychological Research (New 
York: Rinehart and Co., 1950). 


{ones Q., Psychological Statistics (New York: John Wiley and Sons, 
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Тһе plan of this chapter is to give, first, an elementary account of 
the principles of variance analysis. The problem of determining the 
significance of the difference between two means will then be con- 
sidered: (1) when the means are independent, i.e., when the sets of 
measures from which the M's are derived are uncorrelated, and (2) 
when M’s are not independent because of correlation among the 
different sets of measures or scores. 


І. How Variance Is Analyzed 
I. When pairs of scores are added to yield a composite score 


While the variability within a set of scores is ordinarily given by 
the standard deviation or о, variability may also be expressed by the 
"variance" or о2. A very considerable advantage of variances over 
SD's is the fact that variances are often additive and the sums of 
Squares upon which variances are based always are. A simple ex- 
ample will illustrate this. Suppose that we add the two independent 
(uncorrelated) scores X and Y made by Subject А on tests X and Y 
to give the composite score Z (ie., Z = X + Y). Now if we add the 
X and Y scores for each person in our group, after expressing each 


score as a deviation from its own mean, we will have for any subject 
that 


z=a2+y 


in which z = Z — M., x = X — M,, and y-Y—M, 
Squaring both sides of this equation, and summing for all subjects 
in the group, we find in general that 


22° = Ул? + Dy? 
The cross product term 2Х ту * drops out as т and y are independent 
* The formula is r — yee (р. 139). If r = 0, Улу must also be zero. 


Lindquist, E. F., Statistical Analysis in Ed: 7 T Boston: 
Houghton Mifflin Co., 1940). М коа Басир 

Snedecor, С. W., Statistical Methods (4th ed.; Ames, Iowa: Iowa State Col- 
lege Press, 1946). 

Goulden, C. H., Methods of Statistical Analysis (New York: John Wiley and 
Sons, 1939). 

Fisher, R. A., Statistical Methods for Research Workers (8th ed.; London: 
Oliver and Boyd, 1941). | 

Fisher, R. A., The Design oh Experiments (London: Oliver and Boyd, 1935). 

(The Fisher references will be difficult for the beginner.) 
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(uncorrelated) by hypothesis. Hence we find that the sum of the 
squares in z plus the sum of the squares in y equals the sum of the 
squares in z. Dividing by N, we have 


xe xe By 


N N N 
or 


o, = 6°, + o*, 


с, = Vr + o?, 


Тһе equation in terms of variances is more convenient and more 
useful than is the equation in terms of SD's. Thus if we divide each 
variance by o?, we have 


Also 


Of wes 

1- EF xu oi 
which tells us what proportion of the variance of the composite Z is 
attributable to the variance of X and what proportion is attributable 


to the variance of Y. This division of total variability into its inde- 
pendent components cannot be readily done with SD's. 


2. When two sets of scores are combined into a single distribution 


The breakdown of total variability into its contributing parts may 
be approached in another way. When two sets of scores, А and В, 
are thrown together or combined into a single distribution (see 
p. 57), the sum of the squares of all of the scores taken from the Мт 
of the single total distribution is related to the component distribu- 
tions A and B as follows: 


Xs = 224+ >а?» + Nada + ХәФь 

where Xa?, = SS * of deviations in total distribution T from Мт 
Уа? = 88 of deviations in total distribution A from M4 
Уд?» = 88 of deviations in total distribution B from Мв 


М, and Np are the numbers of scores in distributions A and B, 
respectively, da and d; are the deviations of the means of А and В 
from the mean of Т, i.e., (Ма — Mr)? = d*4, (Ms — Мт)? = 4. 

Тһе equation given above іп terms of EXa?, is important in the 
present connection because it shows that the sum of the squares of 

* SS = sum of squares. 
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deviations around the mean of a single distribution made up of two 
component distributions can be broken down into two parts: (1) the 
SS around ће M's of the two sets of scores, viz., Ma and М», and 
(2) the sum of squares (times the appropriate N's) of the deviations 
of M, and M; from Му. An illustration will make the application of 
this result to variance analysis clearer. 

Table 34 shows three sets of scores, 5 for group А, 10 for group 
B, and 15 for group T which is made up of A and B. The sums of 
scores, the means and SS around the M’s have been calculated for 
18 Х5--21Х10 


each group. It may be noted that М» = = 20; and 
^ Ma XNatMaXN. 
that, in general, Mp = —42—4 1 — 2 Лл в ү 31). 
3 М, +N; 5 
TABLE 34 А and В are two distributions and Т is a combination of the 


two 


Distribution А Distribution B Distribution T (А--В) 


25 17 25 

15 20 15 

18 26 18 

22 18 22 

10 20 10 

25 17 

19 20 

26 26 

18 18 

21 20 

25 

19 

26 

18 

21 

Sum 90 210 300 
M 18 21 20 
Xa? 138 106 274 


Substituting the data from Table 34 in the sums equation above 
we find that 


274 = 138 + 106 + 5(18 — 20)? + 10(21 — 20)? 
or 274 = 138 + 106 + 20+ 10 
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Of the total SS (274), 244(138 + 106) is contributed by the variabil- 
ity within the two distributions A and В, and 30(20+ 10) is con- 
tributed by the variability between the means of the two distribu- 
tions. This breakdown of total SS into the SS's within component 
distributions and between the M's of the combining distributions is 
fundamental to analysis of variance. The method whereby SS's can 
be expressed as variances will be shown later. 


Il. The Significance of the Difference between Means 
Derived from Independent or Uncorrelated 
Measures or Scores 


1. When there are more than two means to be compared 


The value of analysis of variance in testing experimental hypothe- 
ses is most strikingly demonstrated in those problems in which the 
significance of the differences among several means is desired. An 
example will illustrate the procedures and will provide a basis for the 
discussion of certain theoretical points. 


Example (1) Assume that we wish to study the effects of eight 
different experimental conditions, designated A, B, C, D, E, F, G, 
H, upon performance on a sensory-motor task. From a total of 48 
subjects, 6 are assigned at random to each of 8 groups and the same 
test is administered to all. Do the mean scores achieved under the 


8 experimental conditions differ significantly? 


Records for the 8 groups are shows. in parallel columns in Table 35. 
Individual scores are listed under the 8 headings which designate the 
conditions under which the test was given. Since “conditions” fur- 
nishes the category for the assignment of subjects, in the terminology 
of analysis of variance there is said to be one criterion of classifica- 
tion. The first step in our analysis is a breakdown of the total vari- 
ance (02) of the 48 scores into two parts: (1) the variance attributa- 
ble to the different conditions, or the variance among the 8 means, 
and (2) the variance arising from individual differences within the 
8 groups. The next step is to determine whether the group means 
differ significantly inter se in view of the variability within the sep- 
arate groups (individual differences). A detailed account of the 
calculations required (see Table 35) is set forth in the steps on 
pages 275-279. 


274 * STATISTICS ІМ PSYCHOLOGY AND EDUCATION 


TABLE 35 A hypothetical experiment in which 48 subjects are assigned 
at random to 8 groups of 6 subjects each. These groups are 
tested under 8 different experimental conditions, designated 
respectively A, B, C, D, E, F, G and H. 


Conditions 


A B “СУ МеГа 
64 73 77 78 63 75 78 
72 61 83 91 65 93 46 
68 90 97 97 44 78 41 
7 80 69 82 77 71 50 
69 
95 67 87 77 76 76 82 
Sums 432 468 492 510 390 456 306 372 Grand Sum: 3486 
M's 72 78 82 85 65 76 61 62 General Mean = 72.63 


$32588H 


А. Calculation of Sums of Squares 


Step | Correction term (С) = E 


= 253,171 

Step 2 Total Sum of Squares 
= (64?-- 72? 4-. . . +70 +68?) — C 
= 262,364 — 253,171 = 9193 

Step 3 Sum of Squares among Means of A, B, C, D, E, F, G and H 
= (432)? + (468)? + (492)? + (510)? + (390)? + (456)? 


+ (366)? + (372)? 
6 =C 


= ass — 253,171 = 3527 


Step 4 Sum of Squares within Conditions A, B, C, D, E, F, G and H 
= Total SS — Among Means SS 
= 9193 — 3527 = 5666 
B. Summary: Analysis of Variance 
Sumsof Mean Square 


Source of Variation df Squares (Variance) SD 
Among the means of 

Conditions 74 3527 503.9 
Within Conditions 40 5666 141.6 119 


Total 47 9193 
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503.9 From Table F for 
Е = ——— = 8. 
141.6 " Ті = 7 and nz = 40 
F at 05 = 225 
F at .01 = 3.12 


C. Tests of Differences by Use of t 


For df = 40, £o; = 2.02 (Table D) Текті 
tor 2271 SE, — 119 RH id 
= 11.9 X .577 
= 6.87 


D os = 2.02 X 6.87 = 13.9 
D = 2.71 X 6.87 = 18.6 


Largest difference is between D and G=24 


Smallest difference is between С and Н = 1 


Distribution of Approximately 5 differences sig- 
mean differences f nificant at .01 level 
22-24 2 Approximately 10 differences sig- 
19-21 2 nificant at .05 level 
16-18 3 
13-15 4 
10-12 4 

7-9 3 

4-6 5 

1-3 4 

2 
ee E 


oo 


Step 1 
Correction term (C). When the SD is calculated from original 


=z? a 
measures or raw scores,” the formula 8р? = = — C? becomes 


SD*- ES — M2. The correction (C) equals M directly in this form 
of the equation, since € — AM — M and the AM (assumed mean) 


*See page 54. In analysis of variance calculations it is usually more con- 
venient to work with original measures or raw scores. 
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2 72 
here is zero. Replacing o? by хе we have that = =o M?. 


N 
7үз 
Now if the correction term M? is written (2X) we may multiply 
7\2 
this equation through by N to find that Ez? = XX? - 207. Іп 


е (ХХ)? . А А г, 
"Table 35 the correction term xm is 253,171. This correction is 


applied to the sum of squares, EX?. 


Step 2 


Total sum of squares around the general mean. Since 
2 
Za? = УХ? — a we need only square and sum the original scores 
and subtract the correction term to find SS; (sum of squares around 
the general mean of all 48 scores). In Table 35, squaring each score 
and summing we get a total of 262,364; and subtracting the correc- 
tion, the final SS; is 9193. This SS; may also be computed by taking 
deviations around the general mean directly. The general mean is 
72.63. Subtracting 72.63 from each of the 48 scores, squaring these 
28 and summing we get 9193, checking the calculations from raw 
scores. The formula for sum of squares around the general mean is 
(=X?) 
N 


8S; = ХХ? — (69) 


(88, around generat mean using raw scores) 
Step 3 


Sum of squares among the means obtained under the 8 conditions. 
To find the sum of squares attributable to condition-differences 
(SSurs), we must first square the sum of each column (i.e., each con- 
dition), add these sums and divide the total by 6, the number of 
scores in each group or column. Subtracting the correction found in 
Step 1, we then get the final SS, to be 3527. This SSyp, is simply the 
SS of the separate group M's around the one general mean, multiplied 
by the number of scores in each column. We may сатту out these 
calculations as a check on the result above. Thus for the present 
example: 
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88у» = 6[ (72 — 72.63)? + (78 — 72.63)? + (82 — 72.63)? 
+ (85 — 72.63)? + (65 — 72.63)? + (76 — 72.63)? 
+ (61 — 72.63)? + (62 — 72.63)?] = 3527 


When, as here, we are working with raw scores, the method of calcu- 
lation repeats Step 2, except that we divide the square of each column 
total by 6, the number of scores in each column, before subtracting C. 
Тһе general formula is 


58 (among means) — Qu. cur ee жалы -С (70) 
2 т 


(SS among means when calculation is with raw scores) 


When the number of scores in the groups differ, the squares of the 
column sums will be divided by different n's before the correction is 
subtracted. (See page 282 for illustration.) 


Step 4 


Sum of squares within conditions (individual differences). The SS 
within columns or groups (SS,,) always equals the 887 minus the 
88м. Subtracting 3527 from 9193, we һауе 5666. This SS, may also 
be calculated directly from the data (see p. 296). 


Step 5 


Calculation of the variances from each SS and analysis of the total 
variance into its components is shown in the B part of Table 35. 
Each SS becomes a variance when divided by the degrees of free- 
dom (df) allotted to it (p. 193). There are 48 scores in all in Table 35, 
and hence there are (N — 1) or 47 df in all. These 47 df are allocated 
in the following way. The df for “among the means of conditions” 
are (8 — 1) or 7, less by one than the number of conditions. The df 
within groups or within conditions are (47 — 7) or 40. This last 
dí may also be found directly: since there are (6 — 1) or 5 df for 
each condition (N = біп each group), 5 X 8 (number of conditions) 
gives 40 df for within groups. The variance among M’s of groups 
is 3527/7 or 503.9; and the variance within groups is 5666/40 or 
141.6. 
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If N — number of scores in all and k — number of categories or 
groups, we have for the general case that 


df for total SS -(М-і1) 

df for within groups SS = (N—k) 

df for among means of groups SS = ( k — 1) 
Also: (N—1) = (N — к) + (k — 1) 


Step 6 


In the present problem the null hypothesis asserts that the 8 sets 
of scores are in reality random samples drawn from the same nor- 
mally distributed population, and that the means of conditions A, B, 
C, D, E, F, G and H will differ only through fluctuations of sampling. 
То test this hypothesis we divide the *among means" variance by the 
"within groups" varianee and compare the resulting variance ratio, 
called F, with the F-values in Table F (see p. 429). The Ё in our 
problem is 3.56 and the df are 7 for the numerator (nı) and 40 for the 
denominator (na). Entering Table F, we read from column 7 (ni) 
and row 40 (nə) that an F of 2.25 is significant at the .05 level and 
an F of 3.12 is significant at the .01 level. Only the .05 and .01 points 
are given in the table. These entries mean that, for the given df's, 
variance ratios or F’s of 2.25 and 3.12 can be expected once in 20 and 
once in 100 trials, respectively, when the null hypothesis is true. 
Since our Р is larger than the .01 level, it would occur less than once 
in 100 trials by chance. We reject the null hypothesis, therefore, and 
conclude that the means of our 8 groups do in fact differ. 

F furnishes a comprehensive or over-all test of the significance of 
the differences among means. A significant F, does not tell us which 
means differ significantly, but that at least one is reliably different 
from some others. If F is not significant there is no reason for further 
testing, as none of the mean differences will be significant (see p. 281). 
But if Р is significant, we may proceed to test the separate differences 
by the t-test (p. 427) as shown in Table 85 С. 


Step 7 


The best estimate which we сап make of the uncontrolled variabil- 
~ Шу arising from individual differences is given by the SD of 11.9 
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computed from the “within groups" variance given in Table 35 B. 
This SD is based upon all of our data and is a measure of subject 
variability after the systematie effects arising from differences in 
column means have been allowed for. In testing mean differences by 
the t-test, therefore (Table 35 C), the SD of 11.9 is used throughout 
instead of the SD's caleulated from the separate columns, A, B, C, D, 
E, Е, С and Н. The standard error of any mean (SEx) is a 
or 11.9/\/6 = 4.86. And the SE of the difference (D) between any 
two means is SEp = \/4.862 + 4.867 or 6.87. A general formula for 
calculating SE directly is 


SE» = 60, [+ (71) 


(standard error of the difference between any two means in 
analysis of variance) 


where SD,, is the within-groups SD, and т; and ль are the sizes of the 
samples or groups being compared. 

The means of the 8 groups in Table 35 range from 61 to 85, and 
the mean differences from 24 to 1. To determine the significance of 
the difference between any two selected means we must compute a 
t-ratio by dividing the given mean difference by its SE». The result- 
ing t is then compared with the t in Table D for 40 df, viz., the num- 
ber of df upon which our SD, is based. A more summary approach 
than this is to compute that difference among means which for 40 df 
will be significant at the .05 or the .01 level and check our dif- 
ferences against these standards. This is done in Table 35 C. We 
know from Table D that for 40 df, a t of 2.02 is significant at the .05 
level; and a t of 2.71 is significant at the .01 level. Since ¢ = mean 
difference/SEp, we may substitute 2.02 for ¢ in this equation and 
6.87 for SE, to find that a difference of 13.9 is significant at the .05 
level. Using the same procedure, we substitute 2.71 for t in the equa- 
tion to find that a difference of 18.6 is significant at the .01 level. 


Eight means will yield қ-а or 28 differences. From the dis- 


tribution of these 28 differences (Table 35 C) it is clear that approx- 
imately 5 differences are significant at the .01 level (i.e., are 18.6 or 
more) ; and approximately 10 at the .05 level (i.e., are 13.9 or more). 
Тһе largest difference is 24 and the smallest is do 
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Discussion * 

A few additional comments may clarify the calculations in Table 
35. 

(1) First, it must be remembered that we are testing the null 
hypothesis—the hypothesis that there are no true differences among 
our 8 condition-means. Stated differently, we are testing the hy- 
pothesis that our 8 groups are in reality random samples drawn from 
the same normally distributed population. The F-test refutes the 
null hypothesis by demonstrating differences among our means which 
cannot be explained by chance: ie. differences larger than those 
which would occur by sampling accidents once in 100 trials if the 
null hypothesis were true. 

(2) The 47 df (48—1) in the table are broken down into 7 df 
which are allotted to the 8 condition-means and 40 df which are 
allotted to individual differences (variations within groups or col- 
umns). Variances are calculated by dividing each SS by its own df. 

(3) In problems like that of Table 35 (where there is only one 
criterion of classification), all 3 variances (total, among means and 
within groups) are in effect estimates of the variance in the popula- 
tion of scores from which our 8 samples are drawn. Only two of these 
variances are independent: the variance among condition-means and 
the variance within groups, since V is composed of these two. These 
two independent estimates of population variance are used in com- 
puting the variance ratio and making the F-test. When samples are 
strictly random these two variances are equal and F is 1.00. More- 
over, when F is 1.00, the variance among group means is no greater 
than the variance within groups; or, put differently, group-means 
differ no more than do the individuals within the groups. The extent 
to which F is greater than 1.00 becomes, then, a measure of the sig- 
nificance of the differences among group means. The larger the F the 
greater the probability that group mean differences are greater than 
individual variation—sometimes called “experimental error.” 

(4) According to the traditional method of treating a problem like 
that of Table 35, 8 SD’s would first be computed, one around each of 
the 8 column means. From these SD's, БЕ? of the means and SE’s of 
the differences between pairs of means would be calculated. A t-test 
would then be made of the differences between any two given means 
and the significance of this difference determined from Table D. 

Analysis of variance is an improvement over this procedure in sev- 


* See Garrett, Н. E., and Zubin, J., "The Analysis of Variance in Psychologi- 
eal Research," Psychol. Bull., 1943, 40, 233-267. 
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eral respects. In Table 35 we first compute an F-ratio which tells us 
whether any mean differences are significant. If F is significant, we 
may then compute a single SE». This SE» is derived from the SD, 
calculated from the 8 groups after systematic mean-differences have 
been removed. Moreover, this within-groups SD—based as it is upon 
all 48 scores and with 40 df—furnishes a better (ie., more reliable) 
measure of uncontrolled (or experimental) variation in the table 
than could be obtained from SD’s based upon only 8 scores and 7 df. 
Pooling of sums to obtain the within-groups SD is permissible, since 
the deviations in each group have been taken from their own mean. 

(5) If the F-test refutes the null hypothesis we may use the t-test 
to evaluate mean-differences. If the F-test does not refute the null 
hypothesis there is no justification for further testing, as differences 
between pairs of means will not differ significantly unless there are a 
number of them—in which сазе one or two might by chance equal or 
approach significance.* 


2. When there are only two means to be compared 


In order to provide a further comparison of analysis of variance 
with the methods of Chapter 9, example (4), page 223, is solved in 
Table 36. This second example will show that when only two means 
are to be compared, the F-test reduces to the t-test. 


TABLE 36 Solution of Example (4), page 223, through methods of analy- 
sis of variance 


Scores: 
Class 1 (N, = 6) Class 2 (№, = 10) 
28 20 
35 16 
32 25 
24 34 
26 20 
35 28 
6180 31 
M,= 30 24 
27 
15 
10|240 
Мұ- 24 


* In 100 strictly random differences, 5 will be significant at the .05 level; that 
is, 2%4% will exceed 1.960 at each end of the curve of differences (p. 188). 
Hence in 28 differences (Table 35 C) 1 or 2 might be significant at the 
05 level (28 x .05 = 1.40) if differences are randomly distributed around zero. 
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TABLE 36—{Continued) 


А. Sums of Squares 

1. Correction: (420)2/16 — 11025 

2. 88, = 282 4- 3524... -152—C 
= 11622 — 11025 = 597 
.. (80) | (240)? 


BE at get a e 


= 11160 — 11025 = 135 
4. SS, = 597 — 135 = 462 


B. Analysis of Variance 


Source df ss MS(V) 
Between means 1 135 135 
Within classes 14 462 33 

Total 15 597 

135 From Table F 

Е = —— = 4.09 
33 F at .05 level = 4.60 
t = yF = 2.02 F at .01 level = 8.86 


ше c le ee ee a | 
Step | 

The sum of all of the 16 scores is 180 + 240 or 420 ; and the correc- 
tion (C) is, accordingly, (420)2/16 or 11025. See page 275. 
Step 2 


When each score has been squared and the correction subtracted 
from the total, the SS around the general mean is 597 by formula 
(69), page 276. 


Step 3 


The sum of squares between means (135) is found by squaring 
the sum of each column, dividing the first by 6 (nı) and the second 
by 10 (n2) and subtracting C. 


Step 4 


The SS within groups is the difference between the SS; and 
SSvetwen мв. Thus SS, = 597 — 135 = 462. 
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Step 5 


Тһе analysis of variance is shown in Table 36 B. SS; is divided 
into SS between means of groups and SS within groups. Since there 
are 16 scores in all, there are (М — 1) or 15 df for “total.” Тһе 85, 
is allotted (k — 1) or 1 df (k — 2). The remaining 14 df are assigned 
to within groups and may be found either by subtracting 1 from 15 
or by adding the 5 df in Class 1 to the 9 df in Class 2. Mean squares 
or variances are obtained by dividing each SS by its appropriate df. 


Step 6 


Тһе variance ratio or Ё is 135/33 or 4.09. Тһе df for between 
means is 1 (nı) and the df for within groups is 14 (ng). Entering 
Table Е with these тв we read in column 1 and row 14 that the .05 
level is 4.60 and the .01 level is 8.86. Our F of 4.09 does not quite 
reach the .05 level so that our mean difference of 6 points must be 
regarded as not significant. Тһе difference between the two means 
(30 - 24) is not large enough, therefore, to be convincing; or, stated 
more mathematically, a difference of 6 can be expected to occur too 
frequently to render the null hypothesis untenable. 

When there are only two means to be compared as here, F = t? or 
t = V/F and the two tests (F and t) give exactly the same result. In 
Table 36 B, for instance, F = \/4.09 or 2.02 which is the t previously 
found in example (4) on page 223. From Table D we have found 
(p. 224) that for 14 df the .05 level of significance for this £ is 2.14. 
Our f£ of 2.02 does not quite reach this level and hence (like F) is not 
significant. If we interpolate between the .05 point of 2.14 and the 
.10 point of 1.76 in Table D, our і of 2.02 is found to fall approxi- 
mately at .07. In 100 repetitions of this experiment, therefore, we 
can expect a mean difference of 6 or more to occur about 7 times— 
too frequently to be significant. 


3. Example (5), page 225, solved by analysis of variance 


In problems requiring the comparison of two group means either 
F or t may be employed. From the standpoint of caleulation, Ё is 
perhaps somewhat easier to apply. In example (5), page 225, it is 
easier to calculate ё because raw scores are not given. But F may 
be calculated if desired in the following way. The general mean for 
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the two groups is (40.39 X 31 -+ 35.81 X 42) divided by 73, or 37.75: 
it is the weighted mean obtained from the two group means. 
The SS between the means of the groups of boys and girls is 
31(40.39 — 37.75)? + 42(35.81 — 37.75)? or 374.13; namely, the de- 
viation of each group mean from the general mean weighted in each 
case by the N of the group. 

То get the SS within groups we simply square each SD and multi- 

2 

ply by (N — 1), remembering that SD? = E 1) (р. 189). Іп 
example (5) we find that (8.69)? X30 = 2265.48; and (8.33)? X 41 
= 2844.95. Тһе sum of these two is 5110.43, the SS within groups. 
The complete analysis of variance and F test are shown in Table 37; 
F = 5.20 and t = V/F or 2.28, checking the result given on page 225. 
Our Ё of 5.20 exceeds the .05 level of 3.98 but does not reach the .01 
level of 7.01. As before, F and t give identical results. 


TABLE 37 Solution of example (5), page 225, by analysis of variance 


А. Sums of Squares and General Mean 
2 

1. General mean — (9 X SI + 85.81 x 42) = 37.75 
2. SS between means: 

31(40.39 — 37.75)2 + 42(35.81 — 37.75)? = 374.13 
8. SS within groups: 

30(8.69)? + 41(8.33)? = 5110.43 
B. Analysis of Variance 

Sums of Mean Square 


Source of Variation df ч Squares (Variance) 
Between means 1 374.13 374.1 
Within groups 71 5110.43 719 
F —3741/719 = 5.20 From Table F 
t = VF = V520 = 228 df = 1/71 
F at .05 = 3.98 


F at 01 = 7.01 
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111. The Significance of the Difference between Means 
Obtained from Correlated Groups 


1. When the same group is measured more than once 
[single group method) 


Means are correlated when the two sets of scores achieved by the 
group from which the means were derived are correlated. When 
& test is given and then repeated, analysis of variance may be 
used to determine whether the mean change is significant. The ex- 


TABLE 38 Solution of example (7), page 227, by analysis of variance 


. Sums of Squares 
Correction — (1240)2/24 — erum — 64066.67 
Total Sum of Squares — 68952 — 64066.67 — 4885.33 


Between trials sum of squares: 


кышын ы — 64066.67 = 384.00 


4. Among subjects' sum of squares: 
68391 — 64066.67 — 4324.33 
5. Interaction sum of squares = 4885.33 — (384.00 + 4324.33) 


® dE s > 


= 177 
B. Analysis of Variance 
Sums of Mean Square 
Source of Variation df Squares (Variance) SD 
Between trials 1 384.00 384.00 
Among subjects 11 4324.33 393.12 
Interaction 11 177.00 16.09 4.01 
Total 23 4885.33 
4 t = \/23.86 = 4.88 
Frias = 1609 = 2386 
que 12 From Table F 
F subjects = T609 = 2443 Trials Subjects 
Ў df = 1/11 df = 11/11 
F at 05 = 4.84 2.82 


F at 01 = 9.65 4.46 
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perimental design here is essentially the same as that of the Single 
Group Method of Chapter 9, page 225. Hence example (7), page 227, 
is used in Table 38 to illustrate the methods of analysis of variance 
and to provide a comparison with the difference-method of page 227. 

Тһе procedures for the analysis of variance in example (7) differ 
in at least two ways from the methods of Section П. First, since 
there is the possibility of correlation between the scores achieved by 
the 12 subjects on the first and fifth trials, the two sets of scores 
should not at the outset be treated as independent, (random) sam- 
ples. Secondly, classification is now in terms of two criteria: (a) 
trials and (5) subjects. Because of these two criteria, the total SS 
must be broken down into three parts: (a) SS attributable to trials; 
(b) SS attributable to subjects; and (c) a residual SS usually ealled 
"interaction," Steps in the calculation of these three variances, 
shown in Table 38 A, may be summarized as follows, 


Step | 


Correction (C). As in Section Ш@= 
(1240)2/24 or 64066.67. 


. In example (7) C is 


(=X)? 
N 


Step 2 


Total SS around general mean. Again the calculation repeats 
the procedure of Section II. 


SS; = (50?--422....... + 72? + 50?) — 64066.67 
— 68952 — 64066.67 — 4885.33 


Step 3 


SS between the means of trials. Тһеге are two trials of 12 scores 
each. Therefore, 


2 2 

95.1 = (672): + (668)? xi (668) 
12 

= 64450.67 — 64066.67 = 384.0 


— 64066.67 


Step 4 


SS among the means of subjects, A second "between means" SS 
is required to take care of the second criterion of classification. 
There are 12 subjects and each has two trials. Hence, 
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2 2 2 2 
MAESA Fard, +1342 + 88? 6406667 


SScubjects = 
= 68391.00 — 64066.67 = 4324.33 


Step 5 


Interaction SS. The residual variation or interaction is whatever 
is left when the systematic effects of trial differences and subject 
differences have been removed from the total SS. Interaction meas- 
ures the tendency for subject performance to vary along with trials: 
it measures the factors attributable neither to subjects nor trials 
acting alone, but rather to both acting together. Interaction is 
obtained most simply * by subtracting trials 88 plus subjects SS from 
total SS. Thus 


Interaction SS = 88; — (SSyuvjects + SStriats) 
= 4885.33 — (384 + 4324.33) 
=177 


Step 6 


As before, SS’s become variances when divided by their appro- 
priate df. Since there are 24 trials in all we have (24 — 1) or 23 df 
for the total SS. Two trials receive 1 df, and 12 subjects, 11. The 
remaining 11 df are assigned to interaction. The rule is that the df 
for interaction is the product of the df for the two interacting vari- 
ables, here 1X11. In general if N = total number of scores, 
r — rows and k — columns, we have 


df for total SS -(М-і1) 
df for column SS (trials) = ( k —1) 
df for row SS (subjects) = ( r—1) 
df for interaction SS = ( k—1) (r—1) 

The three measures of variance appear in Table 38. Note that we 
may now calculate two F’s, one for trial differences and one for sub- 
ject differences. In both cases the interaction variance is placed in 
the denominator of the variance ratio, since it is our best estimate of 
residual variance (or experimental error) after the systematic influ- 
ences of trials and subjects have been removed. The F for trials is 

* Interaction may be calculated directly from the data. 
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23.86 and is much larger than the 9.65 we find in Table F for the .01 
point when n; — 1 and n; — 11. This means that the null hypothesis 
with respect to trials is untenable and must be abandoned. The 
evidence is strong that real improvement took place from trial 1 to 
trial 5. 

Ordinarily in most two-criteria experiments we are concerned 
primarily with one criterion, as here. It is possible, however (and 
sometimes desirable), to test the second criterion—viz., differences 
among subjects. The F for subjects is 24.43 and again is far larger 
than the .01 point of 4.46 in Table Е for n; = 11 and n; = 11. It is 
obvious that some subjeets were consistently better than others 
without regard to trial. 

Since there are two trials, we have two trial means. Hence, if we 
compute a ¢ from the F for trials, it should be equal to that found by 
the difference-method. The F of 23.86 yields a t of \/23.86 or 4.88 
which checks the of 4.88 on page 227. 

Computations needed for the difference-method of example (7), 
page 227, are somewhat shorter than are those for analysis of vari- 
ance, and the difference-method would probably be preferred if one 
wished to determine only the significance of the difference between 
the two trial means. If, however, the significance of the differences 
in the second criterion (differences among subject means) is wanted, 
analysis of variance is more useful. Moreover, through a further 
analysis of variance we can determine whether individual differences 
(differences among subjects) are significantly greater than practice 
differences (differences between trials). Thus if we divide the 
Veurjects by the Veris, the resulting F is 393.12/384 or 1.02. For an 
nı = 11 and n; = 1, the .05 point is 243. Hence, in the present experi- 
ment, at least, we may feel quite sure that individual differences 
were no greater than practice differences. Since the reverse is usually 
true, the implication to be drawn is that practice in the present 
experiment must have been quite drastic: a conclusion borne out by 
the F-test for trials. 


2. When in evaluating the differences between two or more groups оп а 
test we wish to allow for initial differences among the groups on 
the same or different measures 


In many experimental situations, especially in the fields of 
memory and learning, we wish to compare groups that are initially 
unlike, either in the variable under study or some presumably related 
variable. In Chapter 9, two methods were given for equating groups 
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initially—having them “start from scratch." In the first method, 
experimental and control groups were made equivalent initially by 
person-to-person matching; and in the second method, groups were 
matched initially for mean and o in one or more related variables. 
Neither of these methods is entirely satisfactory and neither is 
always easy to apply. Equivalent groups often necessitate a sharp 
reduction in size of N (and also in variability) when the matching of 
scores is difficult to accomplish. Furthermore, in matched groups it 
is often difficult to get the correlation between the matching variable 
and the experimental variable in the population from which our 
samples were drawn (p. 231). 

Analysis of covariance represents an extension of analysis of vari- 
ance to allow for the correlation between initial and final scores. 
Covariance analysis is especially useful to experimental psychol- 
ogists when for various reasons it is impossible or quite difficult to 
equate control and experimental groups at the start: а situation 
which often obtains in actual experiments. Through covariance one 
is able to effect adjustments in final or terminal scores which will 
allow for differences in some initial variable. (For many other uses 
of covariance the reader should consult the references on page 268.) 

Table 39 presents a numerically simple illustration of the applica- 
tion of analysis of covariance. The data in Example (1) are artificial 


Example (1) Suppose that fifteen children have been given 
one trial (X) of a test, Five are then assigned at random to each 
of three groups, A, B and C. After two weeks, say, group A is 
praised lavishly, group B scolded severely and the test repeated 
(Y). At the same time, a second trial (Ү) is also given to group С, 
the control group, without comment. 


TABLE 39 To illustrate covariance analysis 


Original Data [Example (1) ] 
Group A (praised) Group B (scolded) Group C (control) 
X Y: Xi Жа) Ye As X; Ys XY: 


15 30 450 25 28 + 700 5 10 50 

10 20 200 10 12 120 10 15 150 

20 25 500 15 20 300 20 20 40 

5 15 75 15 10 150 Б {10 50 

10 20 200 10 10 100 10 10 100 

Sums 60 110 1425 75 80 1370 50 65 750 
Мз 12 22 15 16 10 13 


For all 3 groups: EX = 185 SY = 255 vi 
A А хуз 508. л, 
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and are purposely meager so that the procedure will not be swamped 
by the numerical calculations. 
Step |. Correction terms: 
С. = (185)?/15 = 2282 
C, = (255)?/15 = 4335 


_ 185 X 255 


Cay = 3145 


Step 2. Total SS 
For х = 2775 — 2282 = 493 
у = 5003 — 4335 = 668 
zy = 3545 — 3145 = 400 


Step 3. Among Group Means SS 


2 |j 
Бот = ETO — 2089 = 63 
2 2 2 
y = НЕЕ EO — 4935 = 210 
ay = UX MOF XE юха д, s 


Step 4. Within Groups SS 
Forz =493— 63 = 430 
у = 668 — 210 = 458 
zy —400— 25 = 375 


десант нк | s e ыру --—————— 


Step 5. Analysis of Variance of X and Y scores, taken separately 
Source of Variation df SS, SS, М5, (Va) MS,(V,) 


жалла ee 0 


Among Means 2 63 210 31.5 105 
Within Groups 12 430 458 35.8 38.2 
Total 14 493 668 
1.5 From Table F 
Е. = === = 38 
47353 dj 2/12 
p, — 105 Жер F at 05 level = 3.88 


У 7382 j F at 01 level = 6.93 
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Neither F is significant. Mean differences on final trial approach 
significance. 

Е. = 88 shows that the experimenter was quite successful in 
getting random samples in Groups A, B, C 


Step 6. Computation of Adjusted SS for Y: i.e., SS; 


2 
Total SS = 668 — “Ë = 343 
493 
2 
Within SS — 458 — a =131 


Among M’s SS =343— 131 = 212 


Analysis of Covariance 
Source of Variation df SS, SS, 88,, 58,, М5,,(Уу.) SDyo 


Among Means 2 63 210 25 212 106 
Within Groups 11% 430 458 375 ІЗІ 12 3.46 
Total 13 493 668 400 343 
106 From Table F 
Fey SB df 2/11 


F at .05 level = 3.98 
F at .01 level = 7.20 


Step 7. Correlation and Regression 


= buta = но 81 
Тюш = 535668 ^ 70 total = доз 2: 
Бе T 22 b, = ЗА 40 
там m DR А 
815 bus > 87 
Twithin — 430 x 458 = 84 within = 430 — ° 
Step 8. Calculation of Adjusted Y-Means 
Groups N Mx My Му. х (adjusted) 
А 5 12 22 223 
B 5 15 16 13.7 
с 5 10 13 15.0 
General Means 123 17 17.0 


* 1 df lost, see page 294. 


292 • STATISTICS IN PSYCHOLOGY AND EDUCATION 
Мух = My — b(Mx — GMy) 
For Group A: My — bz = 22 — .87 (12 — 123) = 223 
В: My — bz = 16 — .87(15 — 123) = 13.7 
С: My — bz = 13 — 87(10 — 123) = 15.0 


Step 9. Significance of differences among adjusted Y-Means 
SDi. = \/12 = 3.46 
3.46 
SEu,. == UE = 1.55 
SE» between any two adjusted means = SDy,» 4A tx 
т 


— 346 Le 1= 3.46 X 63 = 2.18 (71) 


For df = 11, t.o5 = 220; to; = 3.11 (Table D) 
Significant difference at .05 level = 2.20 X 2.18 = 4.80 
Significant difference at .01 level = 3.11 X 2.18 = 6.78 
A differs significantly from both B and C at .01 level. 
B and C are not significantly different. 


We thus have three groups—two experimental and one control— 
with initial scores (X) and final scores (Y). The problem is to deter- 
mine whether the groups differ in the final trial (Y) as a result of the 
incentives. The method permits us to determine whether initial 
differences in (X) are important and to allow for them if they are. 


Table 39 gives the necessary computations. The following steps 
outline the procedure. 


Step | 
Correction term (C). There are three correction terms to be 


applied to SS's, one for X, one for Y and one for the cross products 
in X and Y. Calculation of С, and С, follows the method of page 


275. The formula for Cay is MXN or in our problem 1802020 


Ѕер 2 


SS for totals. Again we have three 887 for totals: SS,, SS, and 


—€————— HH ———— 1 
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88,,, of which only 58, is new. The formula for SS,, is 
88, ->ХУ- C, (72) 


(sum of squares for xy in analysis о) covariance) 


Тһе SS, is found by multiplying pairs of X and Y scores, sum- 
ming over the range and subtracting Coy: thus (15 30 + 10 X 20 
+...+10X 10) — 3145 = 400. 


Step 3 


SS among means of the three groups. Caleulations shown in Table 
39 follow the method of page 289 for X and Y. The "among means" 
term for zy is the sum of the corresponding X and Y column totals 
(e.g., 60 X 110 -+ 75 X 80 + 50 X 65) divided by 5 and minus Coy. 


Step 4 


SS within groups. For 2, y, and zy these SS's are found by sub- 
tracting the “among means" 8876 from the 897. 


Step 5 


A preliminary analysis of variance of the X and Y trials, taken 
separately, has been made in Table 39. The F test applied to the 
initial (X) scores (Р. = .88) falls far short of significance at the .05 
level, from which it is elear that the X-means do not differ signifi- 
cantly and that the random assignment of subjects to the three 
groups was quite successful. The F-test applied to the final (Y) 
scores (Fy = 2.75) approaches closer to significance, but is still con- 
siderably below 3.88, the .05 level. From this preliminary analysis 
of variance of the Y-means alone we must conclude that neither 
praise nor scolding is more effective in raising scores than is mere 
repetition of the test. 


Step 6 


The computations carried out in this step are for the purpose of 
correcting the final (Y) scores for differences in initial (X) scores. 
The symbol 8S,., means that the SS, have been “adjusted” for any 
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variability in Y contributed by X, or that the variability in X is held 
constant. The general formula (see p. 297) is 
88,, = SS, — (ба)! (73) 
y.a — “88, 


(SS in y when variability contributed by x has been removed 
or held constant) 


2 
For 88, we have that SS,, = 668 — (400) 


2 
that 88, „ = 458 — (875)? 


or 343; for SSwitnin 
= 131. The SS for among means is the ad- 


justed SS; minus adjusted SSyitnin. This last 88, , cannot readily be 
calculated directly.* 


From the various adjusted sums of squares the variances (М85,,) 
can now be computed by dividing each SS by its appropriate df. 
Owing to the restriction imposed by the use of formula (73) (reduc- 
tion of variability in X) 1 df is lost and the analysis of covariance 
(Table 39) shows only 11 df for within groups instead of 12, and 
only 13 instead of 14 for total. 

The value of analysis of covariance becomes apparent in Table 39 
when the F-test is applied to the adjusted among and within vari- 
ances. F,, = 106/12 or 8.83, and is highly significant—far beyond 
the .01 level (.01 = 7.20). This Ё„„ should now be compared with 
the F, of 2.75 (p. 290) obtained before correcting for variability in 
initial (X) scores, It is clear from F,» that the three final means— 
which depend upon the three incentives—differ significantly after 
they have been adjusted for initial differences in X. To find which 
of the three possible differences is significant or whether all are 
significant we must apply the t-test (in Step 9). 


Step 7 


An additional step is useful, however, before we proceed to the 
t-test for adjusted means. From the 58% in 2, y and ту it is pos- 
sible to compute several coefficients of correlation. These are helpful 
in the interpretation of the result obtained in Step 6. The general 
formula used is = — >% (p. 189) ; it may be applied to the 


appropriate SS's for total, among means and within groups. 


* See McNemar, Q., ор. cit., p. 324. 
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The within groups correlation of .84 is a better measure of the 
relationship between initial (X) and final (Y) scores than is the 
total correlation of .70, as systematic differences in means have 
been eliminated from the within r. It is this high correlation between 
X and Y which accounts for the marked significance among Y -means 
when the variability in X is held constant. High correlation within 
groups reduces the denominator of the variance ratio, Fy.., while low 
correlation between X and Y means (namely, .22) does not propor- 
tionally affect the numerator. Thus we note that the within groups 
variance of 38.2 is reduced through analysis of covariance to 12, 
while the among means variance is virtually unchanged (from 105 
to 106). When correlation among scores is high and correlation 
among means low (as here), analysis of covariance will often lead to 
a significant F when analysis of variance fails to reveal significant 
differences among the Y-means. These two r’s may be used, there- 
fore, in a preliminary way to decide whether analysis of covariance 
is worth while. 

Regression coefficients for total, among means and within groups 


have been calculated by use of the formula b = 22 (р. 297). Тһе 
byitnin is the most nearly unbiased estimate of the regression of X 
on Y, since any systematic influence due to differences among means 
has been removed. Therefore, bwitnin is used in the computation of 


the adjusted Y-means in Step 8. 


Step 8 


Y-means can be adjusted directly for differences in the X-means 
by use of the formula Mx.y = My = (Мх — Gen.Mx) * in which 
the regression coefficient, b, is the byitnin of .87. My is the original or 
uncorrected Y-mean of a group; Mx is the corresponding X-mean of 
a group and Gen.M x is the пеап of all X scores. It will be noted 
that the B and C means receive more correction than the A mean 
which is only slightly changed. 

F,» tells us, it must be remembered (p. 294), that at least one of 
our adjusted Y-means differs significantly from one other mean. To 
determine which mean differences are significant we must first com- 
pute the adjusted Y-means and then test these differences by the 
t-test. 


* See p. 202. у — bz = adjusted value of y, or My — bz = My.x. Substitute 
z= (Me — Gen M3) to give My.x = My — b(Mx — Gen.Mx). 
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Step 9 


The Variance, is 12 (Table 39) as compared with the Variance, 
of 38.2 and the SD, is \/12 or 3.46. From formula (71) we find 
that the standard error of the difference between any two means is 
2.18. For 11 df, t is 2.20 at the .05 and 3.11 at the .01 level. Sub- 
stituting for to; and 8Ер in the equation t = D/SEp, we obtain 
significant differences at the .05 level and .01 level of 4.80 and 6.78, 
respectively. It is clear by reference to Step 8 that the adjusted A 
mean is significantly higher than the B and C means (at the .01 
level) but that B and C do not differ significantly. We may con- 
clude, therefore, that when initial differences are allowed for, praise 
makes for significant changes in final score, but that scolding has 
no greater effect than mere repetition of the test. Neither of these 
last two factors makes for significant changes in test score. 


Appendix to Chapter 10 
(a) Calculation SS, [Example (1), p. 274] 


Columns 


А: [04* +72 +... +982] — = — воо 
ВЧ ДЕ rr +072] — ERË он 
Gi Eds m. +97) — CPE = 454 
D: [7 ices +77 – ©" — 300 
Е: fearn.. +768 — ©” = то 
Fs [814 euis +707) — GOO)" = 488 
Ge. REPRE ves +82] — SES" = 1540 
н: Ты Бы. +в] — EP — 338 
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(b) Derivation of the formula 


S, 2 
SS, = ss, — ба) 
Let X — independent variable 


Y — dependent variable 
Try = correlation between X and Y 


Then Oya = 02,(1— т) = o*, — oy, p. 162 
(zy)? 
"xs. Sy . 139 
"зай " 
he: Sry)? 
Substituting, ^ 02. = 0, — бегі) 
ҚЫЛЫ 


In terms of SS: 58,, = 88, — SS, 
д 


(с) Derivation of formula 


2% 2 Sey _ Уту 
Substituting b = Not, = NS 
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PROBLEMS 


1. Ina learning experiment, 10 subjects are assigned at random to each of 
six groups. Each group performs the same task but under slightly 
different experimental conditions. Do the groups differ in mean per- 
formance? 


1 2 3 4 5 6 
41 40 36 14 41 55 


40 36 33 38 35 36 
39 40 29 51 52 41 


41 34 30 41 41 36 
39 34 45 36 34 48 
41 39 39 36 10 36 


36 36 33 36 44 42 
35 84 82 32 26 42 
35 41 34 38 54 34 
87 87 84 36 30 40 Grand sum 


Sums 384 371 345 358 367 410 2235 
2. Solve problem (2), page 243, by the methods of analysis of variance. 


3. Twenty subjects are paired on the basis of their initial scores on a test. 
Ten (one member of each pair) are then assigned to an experimental and 
10 to a control group. The experimental group is given special practice 
and both groups are retested. Data for final scores are as follows: 


Pairs of Subjects 
1" 2-79 147 5/6 7 178 9/10 Тон! 


Control group 25 46 93 45 15 64 47 56 73 66 530 
Experimental group 36 57 89 67 19 78 46 59 69 70 590 


(a) Do the groups differ significantly in mean performance? 

(b) Do subject-pairs differ significantly ? 

(c) Check the result in (a) by taking the difference between pairs of 
scores, and testing the mean difference (by t-test) against null 
hypothesis. 


4. In the following table * the entries represent blood cholesterol readings 
taken from 18 patients in April and in May. 
(а) Is the rise from April to May significant? 
(b) Are there significant individual differences, regardless of month? 


* Fertig, John W. “Тһе Use of Interaction in the Removal of Correlated Vari- 
ation,” Biometric Bull., 1936, 1, 1-14. 
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(c) From the column of differences, compute Mp and SDp. Using the 
t-test, measure the significance of Мр against the null hypothesis. 
Compare with the result in (a). 


Individual April May Difference Sum 
1 158.0 190.5 32.5 348.5 
2 158.5 177.0 18.5 335.5 
3 137.5 1720 34.5 309.5 
4 145.5 152.5 70 298.0 
5 130.5 147.0 16.5 277.5 
6 141.0 127.0 -140 268.0 
7 150.5 149.5 — 10 300.0 
8 142.5 152.5 10.0 295.0 
9 148.0 147.0 - 10 295.0 
10 187.5 130.5 — 70 268.0 
11 137.0 133.0 — 40 270.0 
12 160.0 145.5 —14.5 305.5 
13 145.0 124.5 —20.5 269.5 
14 149.5 156.0 6.5 305.5 
15 145.0 143.5 - 15 288.5 
16 132.5 146.0 13.5 278.5 
17 139.0 1480 9.0 287.0 
18 151.0 161.0 10.0 312.0 

Sum 2608.5 2703.0 94.5 5311.5 
SS 37928825 410872.0 4311.25 1576009.25 


5. In an experiment by Mowrer,* previously unrotated pigeons were tested 
for clockwise postrotational nystagmus. The rate of rotation was one 
revolution in 1% sec. An average initial score for each pigeon based upon 
2 tests is indicated by the symbol X. The 24 pigeons were then divided 
into 4 groups of 6 each. Each group was then subjected to 10 daily 
periods of rotation under one of the experimental conditions indicated 
below. The rotation speed was the same as during the initial test and the 
rotation periods lasted 30 sec., with a 30-sec. rest interval between each 
period. Groups 1, 2 and 3 were practiced in a clockwise direction only. 
For Group 4 the environment was rotated in a counterclockwise direc- 
tion. At the end of 24 days of practice, each group was tested again 
under the same conditions as on the initial test. These records are 
called Ү. 


* From Edwards, A. L., Experimental Design in Psychological Research (New 
York: Rinehart, 1950), p. 357. 


300 • STATISTICS ІМ PSYCHOLOGY AND EDUCATION 


Group 1 Group 2 Group 3 Group 4 
Rotation of body Rotation of body Rotation of body Rotation of 
only. Vision only. Vision and environ- environment 
excluded permitted ment only 

Initial Final Initial Final Initial Final Initial Final 

X Y X Y X Е X E 
23.8 79 28.5 25.1 27.5 20.1 22.9 19.9 
23.8 7.1 18.5 20.7 28.1 177 252 282 
22.6 77 20.3 20.3 35.7 16.8 20.8 18.1 
22.8 12 26.6 18.9 18.5 13.5 277 80.5 
22.0 64 213 254 25.9 21.0 19.1 19.3 
19.6 10.0 24.0 30.0 7.9 29.3 32.2 35.1 


134.6 50.3 1391 1404 1586 1184 147.9 1511 


(а) Test the significance of the differences among X-means. (Com- 
pute the among groups and within groups variance and use F-test.) 

(b) Do same as in (a) for the Y-scores. 

(c) By analysis of covariance test the differenees among the adjusted 
means in Y, How much is the variance among Y-means reduced 
when X is held constant? 

(d) Compute the adjusted Y-means, My х by the method of p. 292. 

(e) From the t-test find that difference among adjusted Y-means which 
is significant at the .05 level; at the .01 level. 


ANSWERS 


50.8 
1 No. F= 47° 98, and differences among means may be cttributed 


entirely to sampling fluctuations. 
2. F = 516 and t = 23 (VF) 


з. (а) No. F= 355 = 510 
(b) Yes. P = MHS — 2588 
(0) 2= 50—228 = Р 510 
4. (a) No. P= 280. = 221. dj = 1/17 and Fog = 445 (Table F) 
(D) Yes, just barely А -5i 227 df = 17/17 and Fos = 228 
(c) Mp = 525; SEp = 353. t -15 =149; F =t? = 2.22. 4-17 


187 _ 


те 


5. (a) Difference among X-means not significant. F,— 
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(b) Y-means differ significantly. F, — Mal 13.7. For df of 3/20, 
Ро = 494. à 
303.4 ^ = 
(с) Е,,- 87 153. Variance among Y-means is reduced 11%— 
from 341.4 to 303.4. 
(d) 9.3, 23.9, 18.6 and 24.9 
(e) 5.31; 7.26 


1 9 


==. 


THE SCALING OF MENTAL TESTS AND OTHER 
PSYCHOLOGICAL DATA 


* 


Various devices, many of them based upon the normal probability 
eurve, have been used in the scaling of psychological and educational 
data. As used in mental measurement, a scale may be thought of as 
a continuum or continuity along which items, tasks, problems and the 
like have been located in terms of difficulty or some other attribute. 
The units of a scale are arbitrary and depend upon the method em- 
ployed by the investigator. Ideally, scale units should be equal, have 
the same meaning, and remain stable throughout the scale. Several 
scaling procedures will be described in this chapter. 


l. The Scaling of Test Items 


1. Scaling individual test items іп terms of difficulty (o-scaling) 


We sometimes wish to construct a test which shall contain prob- 
lems or tasks graded in difficulty from very easy to very hard by 
known steps or intervals. If we know what proportion of a large 
group is able to solve each problem, it is comparatively easy to 
arrange our items in a percentage order of difficulty. Such an ar- 
rangement constitutes a scale, to be sure, but a crude one, as per- 
centage differences are not satisfactory indices of differences in 
difficulty (p. 314). 

If we are justified in assuming normality in the trait being meas- 
ured, the variability (i.e., с) of the group will give us a better scaling 
unit than will percentage passing (p. 315). Test items may be “set” 
or spaced in terms of o-difficulty at definite points along a difficulty 


302 
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continuum; their positions with respect to each other as well as with 
respect to some reference point or “zero” is then known in terms of 
a stable unit. To illustrate c-scaling, suppose that we wish to con- 
struct a scale for measuring “reasoning ability" (e.g., by means of 
syllogisms) in 12-year-olds; or a scale for measuring mechanical 
ingenuity in high-school juniors; or a scale for determining degree of 
suggestibility in college freshmen. The steps in constructing such a 
device may be outlined briefly as follows: 


(1) Compile a large number of problems or other test items. These 
items should vary in difficulty from very easy to very hard and 
all sample the behavior to be tested. 

(2) Administer the items to a large group drawn randomly from 
those for whom the final test is intended. 

(3) Compute the percentage of the group which can solve each 
problem. Discard duplicate items and those too easy or too hard 
or unsatisfactory for other reasons." Arrange the problems re- 
tained in an order of percentage difficulty. An item done cor- 
rectly by 90% of the group is obviously less difficult than one 
solved by 75%; while the second problem is less difficult than 
one solved by only 50%. The larger the per cent passing, the 
lower the item in a scale of difficulty. 

(4) By means of Table A convert the per cent solving each problem 
into a o-distance above or below the mean. For example: an 
item done correctly by 40% of the group is 10% or .250 above 
the mean. A problem solved by 78% is 28% (78% — 50%) or 
776 below the mean. We may tabulate the results for 5 items, 
taken at random, as follows (see Fig. 50): 


Problems A B с р Е 

Per cent solving: 93 78 55 40 14 
Distance from the mean 

in percentage terms; —43 —28 - 5 10 36 
Distance from the mean 

in o-terms: —148 —Л7 —.13 25 108 


Problem A is solved by 93% of the group, i.e., by the upper 50% 
(the right half of the normal curve) plus the 43% to the left of 
the mean. This puts Problem A at a point —1.480 from the 
mean. In the same way, the percentage distance of each prob- 


ж Adkins, D. C., et al, Construction and Analysis of Achievement Tests 
(ўызын, D. Ë. U 8, Government Printing Office, 1947), Chap. П. 
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lem from the mean (measured in the plus or minus direction) 
сап be found by subtracting the per cent passing from 50%. 
From these percentages, the o-distance of the problem above or 
below the mean is read from Table A. 


FIG. 50 


(5) When the c-distance of each item has been established, calculate 
the o-distance of each item from the zero point of ability in 
the trait. A zero point may be located as follows: Suppose that 
596 of the entire group fail to solve a single problem. This 
would put the level of zero ability 4596 of the distribution below 
the mean, or at a distance of —1.65с from the mean.* The 
o-value of each item in the scale may then be computed from 
this zero. То illustrate with the 5 problems above: 


Problems A B с D E 
o-distance from mean: —148 -Л7 — 13 25 108 
o-distance from arbitrary 

zero, —1.65 17 88 152 190 2.73 


The simplest way to find o-distances from a given zero is to sub- 
tract the zero point algebraically from the o-distance of each item 
from the mean. Problem A, for example, is —1.48 — (—1.65) or 


* This is, of course, an arbitrary, not a true zero. It will serve, however, аз 8 
reference point (level of minimum ability) from which to measure perform- 
ance. The point -3.006 is often taken as a convenient reference point. 
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76 from the arbitrary zero; and Problem E is 1.08 — (—1.65) 
or 2.736 from our zero. 

(6) When the distance of each item from the given zero has been 
determined, the difficulty value of each item with respect to the 
other items and with respect to zero is known and the scaling is 
finished. The next steps depend upon the purpose of the investi- 
gator. He may select items separated by fixed o-distances (50, 
say) to cover a wide range of talent. Or he may limit the range 
of talent from —2.50c to 2.500, say, and not attempt to establish 
equal difficulty steps. Norms are derived from the final scale for 
age, grade, occupational or other groups. 


2. Scaling total scores on a test 


In the last section we saw how individual test items can be scaled 
in o-units by assuming normality in the trait being measured. We 
shall now describe two methods of scaling score totals or aggregates 
of items—procedures generally followed in constructing aptitude and 
achievement tests. 


(1) G-SCORES AND STANDARD SCORES 


Let us suppose that the mean of a test is 122 and the о is 24. Then 
if John earns a score of 146 on this test, his deviation from the mean 
is 146 — 122 or 24. Dividing John's deviation of 24 by the o of the 
test, we give him a o-score of 24/24 or 1.00. If William’s score is 
110 on this test, his deviation from the mean is 110 — 122 or —12; 
and his score in g-units is —.5. Deviations from the mean expressed 
іп o-terms are called o-scores, z-scores, and reduced scores. Of these 
designations, o-score is certainly the most descriptive, but the other 
terms are often used. We have already used the concept of a o-score 
in the problems in Chapter 5, p. 104. 

The mean of a set of o-scores is always 0 (the reference point) 
and the c is always unity or 1.00. As approximately half of the scores 
in a distribution will lie below and half above the mean, about half 
of our o-scores will be negative and half positive. In addition, 
o-scores are often small decimal fractions and hence somewhat awk- 
ward to deal with in computation. For these reasons, o-scores are 
usually converted into a new distribution with M and o so selected as 
to make all scores positive and relatively easy to handle. Such scores 
are called standard scores. Raw test scores of the Army General 
Classification Test, for example, are expressed as standard scores in 
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a distribution of М = 100 and о = 20; sub-tests of the Wechsler- 
Bellevue are converted into standard scores in a distribution of 
М = 10 and с = 3; and the tests of the Graduate Record Examina- 
tion into standard scores in a distribution of M = 500 and о = 100. 

'The shift from raw to standard score requires a linear transforma- 
tion.* This transmutation does not change the shape of the distribu- 
tion in any way; if the original distribution was skewed (or normal), 
the standard score distribution will be skewed or normal in exactly 
the same fashion. The formula for conversion of raw to standard 
score is as follows: 


Let X =a score in the original distribution 
X’ = a standard score in the new distribution 
M and M' — means of the raw score and standard score dis- 
tributions 
c and с’ = SD's of raw and standard scores 
Th X-—-M' Х-М 
en ——— = 
o с 
or X= (x — M) +M (74) 


(formula for converting raw scores to standard scores) 
An illustration will show how the formula works. 


Example (1) Given a distribution with Mean = 86 and с = 15. 
"Tom's score is 91 and Mary's 83. Express these raw scores as 
standard scores in a distribution with a mean of 500 and 6 of 100. 


By formula (74) 
x- My — 86) 4- 500 


Substituting Tom’s score of 91 for X we have 


X’ = 6.67 (91 — 86) -+ 500 
= 533 


Substituting Mary's score of 83 for X, 


X’ = 6.67 (83 — 86) -+ 500 
= 480 i 


_* When the equation connecting two variables, у and т, is that of a straight 
line, changing z's into 1/8 involves a linear transformation. (Formula (74) is 
the equation of a straight line, analogous to the general equation of a straight 
line, y — mz + b. А 
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In a distribution with a mean of 10 and a о of 3, Tom's standard 
score would be 11 and Mary's 9.4; in a distribution with a mean of 
100 and a c of 20, Tom's standard score would be 107 and Mary's 96. 
Other scaling distributions may, of course, be employed. 

Scores made by the same individual upon several tests cannot 
usually be compared directly owing to differences in test units. Thus 
a score of 162 on a group intelligence test and a score of 126 on an 
educational achievement examination cannot be compared meaning- 
fully. If scores like these are expressed as standard scores, however, 
they can be compared provided the distributions of raw scores are of 
the same form. Fortunately, most distributions of scores are so 
nearly bell-shaped (р. 113) that no great error is made in treating 
them as normal. When we can assume normality, a score of 1.006 
on a mechanical aptitude test and a score of 1.000 on a test of 
mechanical interests represent the same relative degree of achieve- 
ment: both are exceeded by approximately 16% of those taking the 
two tests (Table A). A problem will illustrate further this important 
aspect of standard scores. 


Example (2) Given a reading test with a mean of 81 and o of 
12; and an arithmetic test with a mean of 33 and a с of 8. Sue's 
score is 72 in reading and 27 in arithmetic. Assuming the distribu- 
tions of reading and arithmetic scores to be of the same form 
(approximately normal), convert Sue's scores into a standard score 
distribution with Mean = 100 and g = 20 and compare them. 


In the reading test Sue’s score is 9 below the mean of 81. Hence, her 
score is at —.750(—9/12) and her new score is 85(100 — .75 X 20). 
In arithmetie Sue's score is 6 points below the mean; again her 
score is at —.750 and her new score 85(100 — .75 X 20). Sue's 
two standard scores are comparable, and are also equivalent (repre- 
sent same degree of achievement), if our assumption of normality of 
distributions is tenable. 


(2) NORMALIZING THE FREQUENCY DISTRIBUTION; THE T'-SCALE 


Instead of into standard scores, {һе raw scores of a frequency 
distribution may be converted into a system of “normalized” stand- 
ard scores by transforming them into equivalent points in a nor- 
mal distribution. Equivalent scores (p. 306) are measures which 
indicate the same level of talent. Suppose that, in a certain test, 
20% of the group achieve scores better than 73. Now from Table A 
we find that 20% of the area of the normal probability curve lies 
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above .846 (30% falls between the mean and .840). Hence score 73 
is equivalent to 846 in the normal distribution, as both reflect the 
same degree of achievement. 

Normalized standard scores are generally called T-scores. T-scal- 
ing was devised by McCall * and first used by him in the construction 
of a series of reading tests designed for use in the elementary grades. 
Тһе original T-seale was based upon the reading scores achieved by 
500 12-year-olds; and the scores earned by other age groups on the 
same reading test were expressed in terms of 12-year-old perform- 
ance. Since this first use of the method, T-scaling has been employed 
with various groups and with different tests so that it no longer has 
reference specifically to 12-year-olds nor to reading tests. 

T-scores are normalized standard scores converted into a distribu- 
tion with a mean of 50 and c of 10. In the o-scaling of individual 
items, the mean, as we know, is at zero and о is 1.00. The point of 
reference, therefore, is zero and the unit of measurement is 1. If the 
point of reference is moved from the mean of the normal curve to 8 
point 5 о below the mean, this new reference point becomes zero in 
the scale and the mean is 5. As shown in Figure 51, the o-divisions 
above the mean (16, 26, 30, 40, 5с) become 6, 7, 8, 9 and 10; and the 
o-divisions below the mean (—16, —20, —36, —4o, —56) аге 4, 3, 2, 1 
and 0. The c of the distribution remains, of course, equal to 1.00. 


-5 -4 -3 -2 
G- Scole, Zero Point at Mean | 


0 ЭС Ре s Oo 25 
T- Scale, Zero Point at -50 1 
50 


0 10 20 30 
Т-5са!е,2его Point at -50 


FIG. 51 To illustrate o-scaling and T-scaling in a normal distribution 
ж McCall, William A., Measurement (New York: Macmillan, 1939), Chap. 22. 
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Only slight changes are needed in order to convert this o-scale into 
а T-scale. Тһе T-scale begins at —5o and ends at +50. But о is 
multiplied by 10 so that the mean is 50 and the other divisions are 
0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100. The relationship of the 
T-scale to the ordinary o-scale is shown in Figure 51. Note that the 
T-scale ranges from 0 to 100; that its unit, i.e., T, is 1 and that the 
mean is 50. Т, of course, equals .1 of ø which is equal to 10. Тһе ref- 
erence point on the T-scale is set at —50 in order to have the scale 
cover exactly 100 units. This is convenient but it puts the extremes of 
the scale far beyond the ability ranges of most groups. In actual prac- 
tice, Т-всогев range from about 15 to 85, i.e., from —3.56 to 3.50. 

The procedure to be followed in T-scaling a set of scores can best 
be shown by an example. We shall outline the process in a series of 
steps, illustrating each step by reference to the data of Table 40. 


TABLE 40 To illustrate the calculation of T-scores 


a) (2) (3) (4) (5) (6) 
Cum. Freq, below Col. (4) 


Test Cum. Ж 
Шы Үс туу 2 C ТҮ” 
10 1 62 61.5 992 74 
9 4 61 59 952 67 
8 6 57 54 871 61 
7 10 51 46 742 56 
6 8 41 37 59.7 52 
5 13 33 265 427 48 
4 18 20 11 177 41 
3 2 2 1 16 29 


(1) Compile a large and representative group of test items which 
vary in difficulty from easy tohard. Administer these items to а 
sample of subjects (children or adults) for whom the final scale 
is intended. 

(2) Compute the per cent passing each item. Arrange the items in 

an order of difficulty in terms of these percentages. 

Administer the test to а representative sample and tabulate the 

distribution of total scores. Total scores may now be scaled as 

shown in Table 40 for 62 subjects. In column (1) the test scores 
are entered; and in column (2) are listed the frequencies—num- 
ber of subjects achieving each score. Two subjects had scores of 

3, 18 had scores of 4, 13 scores of 5, and so on. In column (3) 

scores have been cumulated (p. 63) from the low to the high 


(3 


= 
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end of the frequency distribution. Column (4) shows the num- 
ber of subjects who fall below each score plus one-half of those 
who earn the given score. Тһе entries in this column may readily 
be computed from columns (2) and (3). There are no scores 
below 3 and 2 scores on 3, so that the number below 3 plus one- 
half on 3 equals 1. There are 2 scores below 4 [see column (3) | 
and 18 on 4 [column (2)]; hence the number of scores below 
4 plus one-half on 4 is 2+9 or 11. There are 20 scores below 
5 (2+ 18) and 13 scores on 5 [column (2)] so that the number 
below 5 plus one-half on 5 is 20 + 6.5 or 26.5. The reason why 
one-half of the frequency on a given score must be added to the 
frequency falling below that score is that each Score is an 
interval—not a point on the scale. The score of 4, for example, 
covers the interval 3.5-4.5, midpoint 4.0. If the 18 frequen- 
cies on score 4 are thought of as distributed evenly over the inter- 
val, 9 will lie below and 9 above 4.0, the midpoint. Hence, if we 
add 9 to the 2 scores below 4 (i.e., below 3.5) we obtain 11 as 
the number of scores below 4.0, the midpoint of the interval 
3.5-4.5. Each sum in column (4) is taken up to the midpoint 
of a score-interval. 

In column (5) the entries in column (4) are expressed as per 
cents of N (here 62). Thus, 99.2% of the scores lie below 10.0 
midpoint of the interval 9.5-10.5; 95.2% of the scores lie below 
9.0, midpoint of 8.5-9.5, ete. 

Turn the per cents in column (5) into T-scores by means of 
Table G. T-scores in Table G corresponding to percentages 
nearest to those wanted are taken without interpolation, as frac- 
tional Т-зсогев are a needless refinement. Thus for 1.6% we 
take 1.79 (T-score = 29) ; for 17.7% we take 18.41% (T-score 
= 41), and so on. 

In Table G, percentages lying to the left of (i.e., below) succeed- 
ing c-points expressed as T-scores have been tabulated, rather than 
per cents between the mean and given 6-роіпін as in Table A. In 
Table G, we are enabled, therefore, to read T-scores directly; but the 
student will note that T-scores can also be read from Table A. To 
illustrate with score 8 in Table 40, which has a percentage-below-plus 
one-half-reaching of 87.1, note that a score failed by 87.1% lies 
37.1% (87.1% — 50.0%) to the right of the mean. From Table A, 
we read that 37.1% of the distribution lies between the mean and 
1.130. Since the c of the T-seale is 10, 1.136 becomes 11 in T-units; 


(4 


= 


(5 


= 
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and adding 11 to 50, the mean, we get 61 as the required T-score (see 
Fig. 51). 


18 


15 
10 


4 
2 
КАШ сы е м ЖАЙ, 10) 


НС. 52 Histogram of the sixty-two scores in Table 40 


40 50 60 
3 А $6718. 9 WM 


FIG. 53 Normalized distribution of the scores in Table 40 and 
Figure 52. Original scores and T-score equivalents are shown 
on baseline 


Figure 52 shows a histogram plotted from the distribution of 62 
scores in Table 40. Note that the scores of 3, 4, 5, etc., are spaced at 
equal intervals along the baseline, ie., along the scale of scores. 
When these raw scores are transformed into normalized standard 
scores—into T-scores—they occupy the positions in the normal curve 
shown in Figure 53. The unequal scale distances between the scores 
in Figure 53 show clearly that, when normality is forced upon а 
trait, the original scores do not represent equal diffieulty steps. In 
other words, normalizing a distribution of test scores alters the orig- 
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inal test units (stretching them ont or compressing them) and the 
more skewed the raw score distribution, the greater is the change 
in unit. 

T-scores have general applicability, a convenient unit, and cover 
a wide range of talent. Besides these advantages, T'-scores from dif- 
ferent tests are comparable and have the same meaning, since refer- 
ence is always to a standard scale of 100 units based upon the normal 
probability curve. T-scaling forces normality upon the scores of a 
frequency distribution and is unwarranted if the distribution of the 
trait in the population is not normal. For the distributions of most 
mental abilities in the population, however, normality is a reason- 
able—and is often the only feasible—assumption. 


(3) A COMPARISON OF 7'-SCORES AND STANDARD SCORES 


T-scores are sometimes confused with standard scores, but the 
assumptions underlying the two sorts of measures are quite different. 
Table 41 repeats the data of Table 40, and shows the T-score equiva- 


TABLE 41 Comparison of T-scores and standard scores 


(Data from Table 40) 


Score Í T-Scores Дн Бае 
та o e o eo. RCH) ee 
10 1 74 75 
9 4 67 69 
8 6 61 63 
7 10 56 57 
6 8 52 52 
5 13 48 46 
4 18 41 40 
3 2 28 34 
N=82 Equation for converting test 
Scores into standard scores (see p. 306) 
For test scores: 
М = 573 X—573 2X'—50 
0-172 Imo nux 
10Х 573 
x se 2 58 
1.72 172 


X'—582X — 333 + 50 
Х' = 582X + 16.7 


SSS ee ee ran сз ИШАН 


lents to the given raw scores, Standard scores with a mean of 50 and 
6 of 10 are listed in column (4) for comparison with the T-scores. 
These standard scores were calculated by means of formula (74) on 
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page 306. The mean of the raw scores is 5.73 and the c is 1.72; and 
the mean of the “new” standard score distribution is, of course, 50, 
with c of 10. Substituting these values in formula (74) we have 


X’ = 5.82X + 16.7 


as our transformation equation. Putting 3, 4, 5, etc., for X in this 
equation we find X"s of 34, 40, 46, ete. These X’ scores will be found 
to correspond fairly closely to the T-scores. This is often the case, 
and the more nearly normal the distribution of raw scores the closer 
the correspondence. The two kinds of scores are not interchangeable, 
however. With respect to original scores, Т-всогев represent equiva- 
lent scores in a normal distribution. Standard scores, on the other 
hand, always have the same form of distribution as raw scores, and 
are simply original scores expressed in o-units. Standard scores rep- 
resent the kind of conversion we make when we change inches to 
centimeters or kilograms to pounds; that is, the transformation is 
linear. Standard scores correspond exactly to Т-всогев when the dis- 
tribution of raw scores is strictly normal. 


(4) PERCENTILE SCALING 


A child who earns a certain score on a test can be assigned a per- 
centile rank (PR) * of 27, 42 or 77, say, depending upon his position 
in the score distribution. Percentile rank locates a child on a scale 
of 100, and tells us immediately what proportion of the group has 
achieved scores lower than he. Moreover, when a child has taken 
several tests, a comparison of his PR’s provides measures of relative 
achievement, which may be combined into a final total score. As a 
method of scaling test scores, PR’s have the practical advantage of 
being readily calculated and easily understood. But the percentile 
scale also possesses marked disadvantages which limit its usefulness. 

Percentile scales assume that the difference between a rank of 10 
and a rank of 20 is the same as the difference between a rank of 40 
and a rank of 50, namely, that percentile differences are equal 
throughout the scale. This assumption of equal percentile units holds 
strictly only when the distribution of scores is rectangular in shape; 
it does not hold when the distribution is bell-shaped, or approxi- 
mately normal. Figure 54 shows graphically why this is true. In the 
diagram we have a rectangular distribution and a normal curve of 
the same area plotted over it. When the rectangle is divided into 5 
equal segments, the areas of the small rectangles are all the same 

* For method of computing PR’s, see p. 68. 
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100 


20 40 60 80 100 
FIG. 54 To illustrate the position of the same five percentiles 
in rectangular and normal distributions 


(20%) and the distances from 0 to 20, 20 to 40, 40 to 60, 60 to 80, and 
80 to 100 are all equal. These percentiles, Poo, P4, ete., have been 
marked off along the top of the rectangle, 

Now let us compare the distances along the baseline of the normal 
ешгуе when these are determined by successive 20% slices of area. 
These baseline intervals can be found in the following way. From 
Table À we read that the 80% of area to the left of the mean extends 
to —.840. The first 20% of a normal distribution, therefore, falls 
between —3.00 and —.84o: covers a distance of 2.160 along the 
baseline. The second 20% (Poo to P40) lies between —.84c and — 256 
(since --.256 is at a distance of 10% from the mean) ; and covers а 
distance of .59с along the baseline. The third 20% (Pi; to Peo) lies 
between —.25с and .25с: straddles the mean and covers .506 on the 
baseline. The fourth and fifth 20%’s occupy the same relative posi- 
tions in the upper half of the curve as the second and first 20% 
occupy in the lower half of the curve. To summarize: 


First 20% of area covers a distance of 2.166 
Second 20% of area covers a distance of .59g 
"Third 2095 of area covers а distance of .50g 
Fourth 20% of area covers a distance of .59с 
Fifth 20% of area covers a distance of 2.16g 
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It is clear (1) that intervals along the baseline from the extreme 
left end (0 to Р», Ро to Pso, ete.) to the extreme right end of the 
normal curve are not equal when determined by successive 20% 
slices of area; and (2) that inequalities are relatively greater at the 
two ends of the distribution, so that the two end fifths are 4 times as 
long as the middle one. à 

Distributions of raw scores are rarely if ever rectangular in form. 
Hence equal per cents of N (area) cannot be taken to represent equal 
increments of achievement and the percentile scale does not progress 
by equal steps. Between Q, and ©з, however, equal per cents of area 
are more nearly equally spaced along the baseline (see Fig. 54), so 
that the PR’s of a child in two or more tests may be safely combined 
or averaged if they fall within these limits. But high and low PR’s 
(above 75 and below 25) should be combined, if at all, with full 
knowledge of their limitations. 


TABLE 42 Percentile distributions for nine-year-olds on three tests 


Method of Combining the Percentile Ranks of a Single Individual 


Percentiles 8% 

S's Pere. 

Tests 0 10 20 9 0 9 60 70 80 90 100 Score Rank 
Picti letion.. 62 240 297 325 372 407 440 450 499 577 646 445 65 
Бите ошар) . Le . 219 190 173 158 152 141 13 126 121 109 80 126 70 
Seguin Form-Board.. 34 24 1 20 18 18 17 16 15 15 13 17 60 
Median, Percentile Ва ере ененнен ано ане азо азе nne зако, 65 


Table 42 gives an illustration of the value of percentile scaling 
when tests scored in different units are to be compared and combined. 
Percentile distributions for 9-year-olds are shown for three tests 
Írom the Pintner-Paterson Scale of Performance Tests.* Тһе sub- 
ject, a 9-year-old boy, made a score of 445 on the Completion Test 
which gave him a PR of 65 (midway between 60 and 70). On the 
Substitution Test, a score of 126 gave him a РЕ of 70; and on the 
Seguin Form Board a score of 17 gave him a РЕ of 60. Тһе scores іп 
the last two tests are in time units (seconds) so that the lowest 
scores numerically represent the highest performance. The median 
of this boy's PR’s is 65, indicating that he stands somewhat above 
the average of 9-year-olds. Since none of these PR’s is extremely 
high or low, they may be combined with little error. 


* Pintner, R., and Paterson, D. G., A Scale of Performance Tests (New York: 
D. Appleton & Co., 1925). pp. 189, 197. 
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Il. The Scaling of Judgments 


1. Converting judgments into normal curve units (product scales) 


We have seen іп the last section how test scores may һе scaled.on 
the principle that the o-value determined from the percentage pass- 
ing a given item is an acceptable index of difficulty. It often happens, 
however, that the ability or trait in which we are interested is of such 
a nature that achievement cannot be expressed by a test score, This 
necessitates the construction of what are called product scales. In 
such scales excellence of performance is evaluated by comparing an 
individual's production with various "standard productions" the 
values of which have been determined beforehand by а consen- 
sus of expert judgment. Handwriting, compositions, and drawings 
are well-known examples of product scales. The excellence of а per- 
son's penmanship, for example, can be determined by comparing a 
sample of his writing with various specimens of handwriting, the 
quality of which has been measured against some criterion. 

Product scales are constructed on the principle that “equally often 
noticed differences" in quality are equal. If composition А, for exam- 
ple, is rated better than composition B by 75% of a group of com- 
petent judges, and composition X is rated better than composition Y 
by 75% of the same judges, then the difference between A and B is 
taken to the be same as the difference between X and Y (because 
equally often observed). 

The assumption that “equally often noticed differences are equal” 
has been criticized * and is most doubtful when applied to the scaling 
of items at the extremes of the qualitative range. The variability of 
judgments upon extremely good or extremely poor specimens will 
ordinarily be less than the range of judgments made upon intermedi- 
ate specimens. In most product scales the accurate measurement of 
these extreme specimens is, perhaps, not so important as is the accu- 
rate sealing of those items which constitute the main body of the 
scale. For this reason, the assumption that equally often noticed 
differences are equal will give scales which are just as valuable 
practically as those resulting from the use of more refined techniques. 

* Thurstone. L. L., “Equally Often Noticed Differences,” Journal of Educa- 
tional Psychology, 1927, 18, 259-293 


Thurstone, L. L., “Psychophysical Analysis,” American Journal of Psychology, 
1927, 38, 368-389. 
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Steps in constructing a product scale may be set down as follows: 


(1) Collect a large number of samples of the product to be scaled 
(e.g., handwriting, drawings, jokes, pictures). These specimens 
should range by gradual stages from very poor to excellent. 

(2) Persuade a number of competent persons to act as judges of the 
comparative excellence of the specimens. Instruct these judges 
to compare every specimen with every other specimen, so that a 
consensus may be obtained on each. The order of merit method, 
the paired comparisons method, or some variation of these, 
should ordinarily be employed here, as these experimental tech- 
niques provide a systematic attack upon the problem of ranking 
samples for excellence.* 

(3) Reduce the number of times each specimen is ranked above each 
other specimen to percentage terms, and express these percents 
as o-distances between each pair of specimens. To illustrate, if 
drawing A is judged better than drawing B by 65% of the group, 
А — В = .390;if В is judged better than C by 77%, В – С 
=.740. These o-differences are read from Table A and are found 
in the following way: If a sample is judged better than another 
by just 50%, there is no observable difference between the two 
and their o-difference is zero. But if A is judged better than B 
by 65%, the difference between A and B (in excess of chance) 
is 15%, which from Table A corresponds to a o-difference of .39. 
In exactly the same way the difference between В and С (in 
excess of chance) is 27%, which corresponds to a o-difference of 
174. Figure 55 shows graphically how percentage differences can 
be converted into o-differences. The distributions of judgments 
upon A, B, and C are assumed to be normal and are taken to be 
equal in range and variability. The mean value of A (its scale 
value) is .39с above the mean value of B, the mean value of 
which is, in turn, .740 above the mean value of C. 

(4) Determine a difference for each pair of specimens, and express 
each item finally selected for the scale as so many o-units from 
the arbitrary zero. The procedure may be illustrated by two 
items, numbers eight and nine, taken from the Hillegas Com- 
position Scale.f Hillegas had each of 202 judges arrange a 
number of English compositions in order of merit. An artificial 


* Woodworth, R. 8., Experimental Psychology (New York: Henry Holt 


& Co., 1938), pp. 372-378. Е-Е } 
+ Hillegas, Milo B., A Scale for the Measurement of Quality in English Com- 


position by Young People, Teachers College Record, 1912, 13, 4, 5-55. 
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НО. 55 То illustrate o-scale differences between specimens A, B, and C. 
The distributions of judgments on the three specimens are taken 
to be normal, and equal in range and variability 


composition was selected as being of just zero merit, and as- 
signed the value of 0 on the scale. Of the 202 judges, 136 or 
67.33% ranked specimen 9 as better than specimen 8. From 
Table A, we find that a percentage difference of 17.33 (67.33 
— 50) indicates a PE difference of 65, and this value expresses 
the amount by which 9 is better than 8. The value of specimen 8 
had already been found to be 7.72PE * above the zero point on 
the scale. Hence, specimen 9 is 7.72 + .65 or 8.37PE above the 
zero composition. The values of the nine compositions on the 
Hillegas Scale as measured in PE units from the zero composi- 
tion are 1.83, 2.60, 3.69, 4.74, 5.85, 6.75, 7.72, 8.37, and 937. 
Note that the steps on the scale are fairly regular and are about 
1PE apart. 


2. Transforming qualitative data into numerical scores 


Tt is possible to express many kinds of qualitative data in quanti- 
tative terms, if we can assume that measures of the trait or ability 
which we have sampled are normally distributed in the population. 
Several techniques based upon the normal curve will be considered 
in this section. 


* The PE was the unit used by НШедаз. РЕ = :67456, p. 97. 
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(1) THE SCALING OF ANSWERS TO A QUESTIONNAIRE 


Answers to the queries or statements in most questionnaires admit 
of several possible replies, such as Yes, No, ?; or Most, Many, Some, 
Few, No; or there are four or five answers one of which is to be 
checked. It is often desirable to “weight” these different alternatives 
in accordance with the degree of divergence from the “typical 
answer" which they indicate. First we assume that the attitude or 
personality trait expressed in answering a given proposition is nor- 
mally distributed. From the percentage who accept each alternative 
answer to a question or statement, we may then find a o-equivalent, 
which will express the value or weight to be given that answer. 
Likert’s * Internationalism Scale furnishes an example of this scal- 
ing technique. This questionnaire contains 24 statements upon each 
of which the subject is requested to give an opinion. Approval or 


TABLE 43 Data for statement No. 16 of the Internationalism Scale 


Answers Ве сеу Approve Undecided Disapprove pe 
Percent checking 13 43 21 13 10 
Equivalent 
o-values —163 —43 43 99 176 
Standard-scores 34 46 54 60 68 


disapproval of any statement is indicated by checking one of five 
possibilities “strongly approve," “approve,” “undecided,” “disap- 
prove,” and “strongly disapprove.” The method of scaling as applied 
to statement No. 16 on the Internationalism Scale is shown in 
Table 43 above, This statement reads as follows: 


16. All men who have the opportunity should enlist in the Citi- 
zens’ Military Training Camps. 
Strongly approve Approve Undecided Disapprove 
Strongly disapprove 


The percentage selecting each of the possible answers is shown in 
the table. Below the percent entries are the c-equivalents assigned 
to each alternative on the assumption that opinion on the question 
is normally distributed—that few will wholeheartedly agree or dis- 
agree, and many take intermediate views. The o-values in Table 43 


* Likert, Rọ, A Technique for the Measurement of Attitudes, Archives of Psy- 
chology, 1932, No. 140. 
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have been obtained from Table H (p. 435) in the following way: 
Reading down the first column headed 0, we find that beginning at 
the upper extreme of the normal distribution, the highest 10% has an 
average o-distance from the mean of 1.76. Said differently, the mean 
of the 10% of cases at the upper extreme of the normal curve is at а 
distance of 1.766 from the mean of the whole distribution. Hence, 
the answer “strongly disapprove” is given a o-equivalent of 1.76 


(see Fig. 56). 


-30 -2¢ -10 0 lo 2c 3c 


FIG. 56 To illustrate the scaling of the five possible answers to statement 
16 on Likert's Internationalism Scale 


To find the o-value for the answer “disapprove,” we select the 
column headed .10 and running down the column take the entry 
opposite 13, namely, .99. This means that when 10% of the distribu- 
tion reading from the upper extreme have been accounted for, the 
average distance from the mean of the next 13% is .99с. Reference 
to Figure 56 will make this clearer. Now from the column headed 
23(13% + 10% “used up” or accounted for), we find entry .43 oppo- 
site 21. This means that when the 23% at the upper end of the dis- 
tribution have been cut off, the mean o-distance from the general 
mean of the next 21% is .430, which becomes the weight of the pref- 
erence “undecided.” The weight of the fourth answer “approve” must 
be found by a slightly different process. Since a total of 44% from 
the upper end of the distribution have now been accounted for, 6% 
of the 43% who marked “approve” will lie to the right of the mean, 
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and 37% to the left of the mean, as shown in Figure 56. From the 
column headed 44 in Table H, we take .08 (entry opposite 696) 
which is the average distance from the general mean of the 6% lying 
just above the mean. Then from the column headed 13 (50% — 37%) 
we take entry .51 (now —.51) opposite 37%, as the mean dis- 
tance from the general mean of the 37% just below the mean. The 
LOU ех. —.43, which is the weight 
assigned to the preference “approve.” The 13% left, those marking 
“strongly approve," occupy the 13% at the extreme (low end) of the 
curve. Returning to the column headed 0, we find that the mean dis- 
tance from the general mean of the 13% at the extreme of the dis- 
tribution is —1.63c. 

In order to avoid negative values, each o-weight in Table 43 can 
be expressed as a o-distance from —3.00с (or —5.000). If referred 
to —3.006, the weights become in order 1.37, 2.57, 3.43, 3.99, and 4.76. 
Dropping decimals, and taking the first two digits, we could also as- 
sign weights of 14, 26, 34, 40, and 48. Again each o-value in Table 43 
may be expressed as a standard score in a distribution the mean of 
which is 50 and the c 10. The category "strongly approve" is 
—16(—1.63 X 10) from the mean of 50, or at 34. Category “ар- 
prove" is —4(—.43 X 10) from 50 or at 46. The other three cate- 
gories have standard scores of 54, 60, and 68. 

When all 24 statements on the Internationalism Scale have been 
scaled as shown above, a person's "score" (his attitude toward inter- 
nationalism in general) is found by adding up the weights assigned 
to the various preferences which he has selected. An individual 
whose opinions are extreme, e.g., who tends strongly to disapprove 
many statements, will receive a proportionally larger total score 
when the choices are o-scaled than he would receive if the five pos- 
sibilities were assigned arbitrary weights of 1, 2, 3, 4, and 5. It has 
been shown, however, that o-scaling yields results which, for the test 
as a whole, are little if any more reliable or more discriminatory than 
the results obtained when the five answers are scored simply 1, 2, 3, 
4, and 5. This virtual equality of scaling and rule-of-thumb method 
is a rather familiar finding in mental measurement. In the present 
instance, it probably arises from the fact that the greater differenti- 
ation which the o-scaling technique provides for single items is lost in 
the process of adding or averaging the score weights from many 
items. A real advantage of o-scaling is that the units of the scale 


algebraic sum 
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are equal and may be compared from item to item or from scale to 
scale. Also, o-scaling gives a more accurate picture of the extent to 
which extreme or biased opinions on a given question are divergent 
from the typical opinion than does the arbitrary weighting method. 


(2) THE SCALING OF RATINGS 


In many psychological problems individuals are rated or ranked 
for their possession of characteristics or attributes not readily meas- 
ured in terms of performance. Honesty, interest in one's work, tact- 
fulness, originality, are illustrations of such traits. Suppose that two 
teachers A and B have rated a group of forty pupils for “social re- 
sponsibility” on a 5-point scale. A rating of 1 means that the trait 
is possessed in marked degree, a rating of 5 that it is almost if not 
entirely absent, and ratings of 2, 3, and 4 indicate intermediate 
degrees. Assume that the percentage of children assigned each rating 
is as follows: 


Rating B 
1 10% 20% 
2 15% 40% 
3 50% 20% 
4 20% 10% 
5 5% 10% 


It is obvious that B rates more leniently than A, so that a rating 
of 1 by B may not represent the same degree of “social responsibil- 
ity” as a rating of 1 by A. Can we assign “weights” or numerical 
Scores so as to make the ratings of the two teachers comparable? 
The answer is “yes,” provided we can assume that the distribution 
of the trait “social responsibility” is normal, and that one teacher is 
as competent a judge as the other. From Table H, we may read 
c-equivalents to the percents given each rating by A and B as 
follows: 


Rating A B 
1 1.76 140 
2 95 27 5 
3 00 - 53 
4 -1.07 —104 
5 -210 - 1.16 


These o-values are read from Table H in exactly the same way as 
were the o-equivalents in the previous problem (p. 431). If we 
assume —3.00c as an arbitrary reference point, the o-values for the 
ratings of A and B all become positive: 


| 
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Rating A B 
1 476 440 
2 3.95 327 
3 3.00 247 
4 193 1.96 
5 90 124 


Dropping decimals, and taking only the first two digits, A's and B's 
ratings become: 


Rating A 
48 
40 
30 
19 

9 


OUO 
BSRSES 


or, expressed as standard scores in a distribution with a mean of 50 
and ao of 10, 


Rating A B 
1 68 64 
2 60 53 
3 50 45 
4 39 40 
5 29 32 


The ratings of A and B may be combined by adding or by averag- 
ing them. 

Table H will prove valuable in enabling one to transmute many 
kinds of qualitative data into quantitative terms or scores. Almost 
any attribute upon which relative judgments can be obtained may be 
assigned scores in a normal distribution in terms of the o of the 


judgments. 


(3) CHANGING ORDERS OF MERIT INTO NUMERICAL SCORES 


It is often desirable to transmute orders of merit into units of 
amount or “scores.” This may be done by means of tables, if we are 
justified in assuming normality for the trait. To illustrate, suppose 
that 15 salesmen have been ranked in order of merit for selling ећ- 
ciency, the most efficient salesman being ranked 1, the least efficient 
being ranked 15. If we are justified in assuming that “selling effi- 
ciency” follows the normal probability curve in the general popula- 
tion we can, with the aid of Table 44 (p. 324), assign to each шап а 
“selling score” on a scale of 10 or of 100 points. Such a score will 
define ability as a salesman better than will a rank of 2, 5, or 14. 
The problem may be stated specifically as follows: 
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Example (1) Given 15 salesmen, ranked in order of merit by 
their sales manager, (a) transmute these rankings into scores on a 
scale of 10 points; (b) a scale of 100 points. 


First, by means of the formula 


Percent position — mea ў (75) 
1 i 


(formula for converting ranks into percents of the normal curve) 


in which R is the rank of the individual in the series * and N is the 
number of individuals ranked, determine the "percent position" of 
each man. Then from these percent positions read the man's seore on 
а scale of 10 or 100 points from Table 44. Salesman А, who ranks 


100(1 — .5) 


No. 1, has a percent position of or 3.33, and his score 


from Table 44 is 9 or 85 (finer interpolation unnecessary). Sales- 
man B, who ranks No. 2, has a percent position of 100(2-- 4) 


15 
ог 10, and his score, accordingly, is 8 or 75. Тһе scores of the other 


salesmen, found in exactly the same way, are given in Table 45. 


TABLE 44 The transmutation of orders of merit into units of amount or 


"scores" + 
Example: If N = 25, and R = 8, Percent Position is mo or 10 (for- 
mula (75) and from the table, the equivalent rank is 75, on a scale of 100 points. 
Percent Score Percent Score Percent Score 
09 99 2232 65 83.31 31 
20 98 23.88 64 84.56 30 
32 97 25.48 63 85.75 29 
45 96 27.15 62 86.89 28 
61 95 28.86 61 87.96 27 
78 94 30.61 60 88.07 26 
97 93 3242 59 89.94 25 
148 92 34.25 58 90.83 24 
142 91 36.15 57 91.67 23 
1.68 90 38.06 56 92.45 22 
1.96 89 40.01 55 93.19 21 
228 88 4197 54 93.86 20 
2.63 87 43.97 53 94.49 19 
3.01 86 45.97 52 95.08 18 
3. 85 47.98 51 95.62 17 
3.89 84 50.00 50 96.11 16 
4.38 83 5202 49 96.57 15 


ЖА rank is ап interval on a scale ; 5 is subtracted from each R because its 
midpoint best represents an interval. Eg, R=5 is the 5th interval, namely 
4-5, and 4.5 (or 5 — 5) is the midpoint, " 

f From Hull, C. L., “The Computation of Pearson's ғ from Ranked Data, 
Journal of Applied Psychology, 1922, 6, pp. 385-390. 
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TABLE 44—{Continued) 


Percent Score Percent Score Percent Score 
4.92 82 54.03 48 96.99 14 
5.51 81 56.03 47 9737 13 
6.14 80 58.03 46 97.72 12 
6.81 79 59.99 45 98.04 11 
7,55 78 61.94 44 98.32 10 
8.33 77 63.85 43 98,58 9 
9.17 76 65.75 42 98.82 

10.06 15 67.48 41 99.03 7 
11.03 74 69.39 40 99.22 6 
12.04 73 7114 39 99.39 & 
13.11 72 72.85 38 99.55 4 
14.25 71 74.52 37 99.68 3 
15.44 70 76.12 36 99.80 2 
16.69 69 77.68 85 99,91 1 
18.01 68 79.17 94 100.00 0 
19.39 67 80.61 33 

20.93 66 81.99 32 


It has been frequently pointed out that the assumption of normal- 
ity in a trait implies that differences at extremes of the trait are rela- 
tively much greater than differences around the mean. This is clearly 
brought out in Table 45; for, while all differences in the order of 
merit series equal 1, the differences between the transmuted scores 
vary considerably. "Тһе largest differences are found at the ends of 
the series, the smallest in the middle. For example, the difference in 
score between A and B or between N and O (on a scale of 100) is 
three times the difference between G and Н. Clearly, it is three 
times as hard for a salesman to improve sufficiently to move from 
second to first place as it is to move from eighth to seventh place. 


TABLE 45 The order of merit ranks of 15 salesmen converted into nor- 
mal curve "scores" 


Percent Beores 


Order of Merit Position 
Salesmen Ranks (Table44) Scale (10) Scale (100) PR's 


A 1 333 9 85 97 
B 2 10.00 8 75 90 
G 3 16.67 7 69 83 
D 4 2323 6 64 77 
Е 5 30.00 6 60 70 
F 6 36.67 6 57 63 
а 7 43.33 5 53 57 
H 8 50.00 5 50 50 
І 9 56.67 5 47 43 
J 10 63.33 4 43 37 
K 11 70.00 4 40 30 
1, 12 76.67 4 36 23 
M 13 8333 3 31 17 
N 14 90.00 2 25 10 
о 15 96.67 1 15 3 
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The percentile ranks (PR’s) of our 15 salesmen in example (1) 
have been entered in Table 45 for comparison with the normal curve 
scores. These PR’s were calculated by means of the following for- 
mula, which converts orders of merit into percentile ranks, 


(100R — 50) 
N 


PR = 100 — (76) 


(percentile ranks for individuals arranged in order of merit) 


The R in the formula is the rank position of the individual, count- 
ing No. 1 as the highest rank. Thus, the salesman who ranks No. 1 


in 15 has a PR of 100 — боох гю = 96.66 or 97; the salesman 


who ranks 5th has a PR of 100 — —— — 70. Note that 


the steps between adjacent PR's are all equal. Orders of merit as 
well as PR’s assume the distribution of ability to be rectangular so 
that equal slices of area correspond directly to equal distances along 
the baseline. 

If there are 100 subjects іп our group, each occupies one division 
of the percentile scale. Hence the rank of the poorest subject is .5 
(midpoint of the interval 0-1) and the rank of the best subject is 
99.5 (midpoint of interval 99-100). The person who ranks 50th in 


the group has a PR of 100 — £100 X 50 — 50) or 50.5, midpoint of 


interval 50-51. Since a subject’s PR is always the midpoint of an 
interval on a scale which runs from 0 to 100, it follows that no one 
can have a PR of 0 or 100. These two points constitute the bounda- 
ries or limits of the percentile scale. 

Another use to which Table 44 may be put is in the combination of 
incomplete order of merit rankings. To illustrate: 


Example (2) Six persons, A, B, C, D, E, and F, are to be ranked 
for honesty by three judges. Judge 1 knows all six well enough to 
rank them; Judge 2 knows only three well enough to rank them; 
and Judge 3 knows four well enough to rank them. Can we obtain 
a fair composite order of merit ranking for all six persons by com- 
bining these three sets of rankings, two of which are incomplete? 


We may tabulate our data as follows: 


Persons 
A B с р Е 
Judge 1% ranking 1 3 4 5 
Judge 2's ranking 2 1 
Judge 28 ranking 2 1 3 


1524 
Awo Hj 


SCALING OF MENTAL TESTS AND OTHER PSYCHOLOGICAL DATA * 327 


It seems fair that A should get more credit for ranking first in a list 
of six than D for ranking first in a list of three, or C for ranking first 
in a list of four. In the order of merit ratings, all three individuals are 
given the same rank. But when we assign scores to each person, in 
accordance with his position in the list, by means of formula 75 and 
Table 25, A gets 77 for his first place, D gets 69 for his, and C gets 73 
for his. See table below: 


Persons 

A B с р Е Е 

Judge 1'з ranking 1 2 3 4 5 6 
score 77 63 54 46 37 23 
Judge 2's ranking 2 1 3 
score 50 69 31 
Judge 3's ranking 2 1 3 4 
score 56 aS Ce a Б: 44 27 
Sum of scores 13 113 127 15 81 81 
Меап 67 57 64 58 41 27 
Order of Merit 1 4 2 3 5 6 


АП of the ratings have been transmuted as shown in example (1) 
above. Separate scores may be combined and averaged to give the 
final order of merit shown in the table. 

By means of formula (75) and Table 44 it is possible to convert 
апу set of ranks into “scores,” if we may assume a normal distribu- 
tion in the trait for which the ranking is made. The method is useful 
in the case of those attributes which are not easily measured by ordi- 
nary methods, but for whieh individuals may be arranged in order 
of merit, as, for example, athletic ability, personality, beauty, and 
the like. It is also valuable in correlation problems when the only 
available criterion * of a given ability or aptitude is a set of ranks. 
Transmuted scores may be combined or averaged like other test 
Scores, 

A word of explanation may be added with regard to Table 44. This 
table represents a normal frequency distribution which has been cut 
off at --2.506. The baseline of the curve is 5, divided into 100 parts, 
each .05с long. The first 056 from the upper limit of the curve takes 
in .09 of 1% of the distribution and is scored 99 on a scale of 100. 
The next .05с (.106 from the upper end of the curve) takes in .20 
of 1% of the entire distribution and is scored 98. In each case, the 
percent position gives the fractional part of the normal distribution 
which lies to the right of (above) the given "score" on baseline. 


PROBLEMS 


1. Five problems are passed by 15%, 34%, 50%, 62%, and 80%, respec- 
tively, of a large unselected group. If the zero point of ability in this 


* Far definition of a criterion, see Chapter 13, p. 345. 
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test is taken to be at —3o, what is the g-value of each problem as 

measured from this point? 

2. (a) The fifth grade norms for a reading examination are Mean — 60 
and SD = 10; for an arithmetic examination, Mean = 26 апа 
SD = 4. Tom scores 55 on the reading and 24 on the arithmetic 
test. Compare his g-scores. In which test is he better? 

(b) Compare his standard scores in a distribution with M of 100 and 
SD of 20. 

8. (a) Locate the deciles in a normal distribution in the following way. 
Beginning at —3g, count off successive 10%’s of area up to +30. 
Tabulate the g-values of the points which mark off the limits of 
each division. For example, the limits of the first 10% from —30 
аге —3.00g and —1.28g (see Table A). Label these points in order 
from —30 as .10, .20, ete. Now compare the distances in terms of 
6 between successive ten percent points. Explain why these dis- 
tances are unequal. 

(b) Divide the baseline of the normal probability’ curve (take as бо) 
into ten equal parts, and erect a perpendicular at each point of 
division. Compute the percentage of total area comprised by each 
division. Are these percents of area equal? If not, explain why. 
Compare these percents with those found in (a). 

4. Fifty workers are rated on a 7-point scale for efficiency on the job. 
The following data represent the distributions of ratings (in which 1 is 
best and 7 worst) for two judges. Judge X is obviously very lenient and 
Judge Z is very strict. To make these two sets of judgments compara- 
ble, use the following three procedures: 

(a) Percentile sealing: divide each distribution into 5 parts by finding 
successive 20%’s of N. Let A = first 20%, B the next 20%, and so 
on to E, the fifth 20%. 

(b) Standard scores: Find the M and SD for each distribution and con- 
vert each rating into a common distribution with M of 50 and SD 


of 10. 

(c) T-scores: Find T-scores corresponding to ratings of 1,2, 3 . . . 7. 
Now compare Judge X's rating of 3 with Judge Z's rating of 3 by 
the three methods. 

Judge X Rating f Judge Z Rating f 
1 5 1 2 
2 10 2 4 
3 20 3 4 
4 5 4 5 
5 4 5 20 
6 B 6 10 
7 2 7 5 


N=50 N=50 
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. Ina large group of competent judges, 77% rank composition А as better 
than composition B; 65% rank B as better than C. If C is known to 
have a o-value of 3.50 as measured from the “zero composition," i.e., 
the composition of just zero merit, what are the g-values of B and A as 
measured from this zero point? 

. Twenty-five men on a football squad are ranked by the coach in order 
of merit from 1 to 25 for all-around playing ability. On the assumption 
that general playing ability is normally distributed, transmute these 
ranks into “scores” on a scale of 100 points. Compare these scores with 
the РЕ в of the ranks. 

. (a) In accordance with their scores upon a learning test, 20 children 
are ranked in order of merit. Caleulate the percentile rank of each 
child. 

(b) If 60 children are ranked in order of merit, what is the percentile 
rank of the first, tenth, fortieth, and sixtieth ? 
. On an Occupational Interest Blank, each occupation is followed by five 
symbols, L! L ? D D!, which denote different degrees of “liking” and 
“disliking.” The answers to one item are distributed as follows: 
L! L ? D D! 
8% 20% 38% 24% 10% 

(a) By means of Table H convert these percents into o-units. 

(b) Express each g-value as a distance from “zero,” taken at —3o, and 
multiply by 10 throughout. 

(c) Express each g-value as a standard score in a distribution of mean 
50, c 10. 

. Letter grades are assigned three classes by their teachers in English, his- 
tory, and mathematics, as follows: 


Mark English History Mathematies 
A 25 11 6 
B 21 24 15 
с 82 20 25 
р 6 8 20 
Е 1 2 8 
85 65 74 


(а) Express each distribution of grades in percents, and by means of 
Table H transform these percents into g-values. ў 

(b) Change these o-values into 2-digit numbers and into standard 
scores following the method on page 305. 

(c) Find average grades [from (b)] for the following students: 


Student English History Mathematics 
S.H. A B с 
ЕМ. с В А 


D.B. B D F 
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10. Caleulate T-scores in the following problem: 
Percent below given score 


Plus One-half 
Scores f Reaching T-score 
91 2 99.5 76 
90 4 98.0 71 
89 6 
88 20 
87 24 
86 28 
85 40 
84 36 
83 24 
82 12 
81 4 


11. Calculate T-scores for the midpoints of the class-intervals in the follow- 
ing distribution: 
Percent below given interval 
Plus One-half reaching 


Scores 1 Midpoint T-score 
40-44 8 94.6 66 
35-39 12 
30-34 20 
25-29 15 
20-24 15 
15-19 5 

75 

ANSWERS 


1. In order: 4.04; 3.41; 3.00; 2.69; 2.16. 
2. (a) In neither, same score in both 
(b) Reading 90, Arithmetic 90 


8. (a) 00 10 20 30 40 50 60 70 80 50 1.00 
—8.00 —128 —84 —.52 —.25 0 25 52 84 128 3.00 
Diffs: 1727 44 32 97 95 25 27 32 44 172 


(b) Percents of area in order: 68; 2.77; 7.92; 15.02; 22.57; 22.57; 
15.92; 7.92; 2.77; 468. 


4. (а) Cvs. А; (b) 52 vs. 61; (c) 50 vs. 60 
5. B, 3.89; А, 4.63 


10. 


11. 
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Rank Р 43 505” 8), 0° 10 
Score: 80 80 75 71 68 65 63 60 58 56 
PR's: 98 04 90 86 82 78 74 70 66 62 
Rank: 14 15 16 17 18 19 20 21 22 93 
Score: 48 46 44 42 40 37 35 32 29 25 
РЕ: 46 42 38 34 30 26 22 18 14 10 
L! L ? 
(a) —1.86 —.94 —.08 
(b) 11 21 29 38 
(с) 31 41 49 58 
Е р с 
(a) English —270 —1.74 —.65 
History —2.28 1-138 —.53 
Math. —171 — 71 18 
(5) English History 
—3.00с Stan. Score --3.006 Stan. Score 
A 42 62 45 65 
B 32 52 34 54 
с 24 44 25 45 
р 13 33 16 36 
F 3 23 7 27 


(c) S. H., 36 or 56; F. M., 36 or 56; D. B., 20 or 40 


T-scores: 


76, 71, 67, 62, 58, 54, 49, 44, 39, 34, 27 


T-scores 


66, 59, 53, 47, 40, 32 


11 


oSESG 


12 
52 
54 
25 
11 


B 


22 
39 
94 


13 
50 
50 


118 
149 
186 


Mathematics 
—3.00c Stan. Score 


49 
39 
31 
23 
18 


THE RELIABILITY AND VALIDITY 
OF TEST SCORES 


* 


|. The Reliability of Test Scores 


The reliability of a test, as of any measuring instrument, depends 
upon the consistency with which it gauges the abilities of those to 
whom it has been applied. When a test is reliable, scores made by the 
members of a group—upon retest with the same test or with alter- 
nate forms of the same test—will differ very little or not at all from 
their original values. A reliable test, therefore, is relatively free of 
chance errors of measurement, and scores earned on it are stable and 
trustworthy. If a subject scores 84, say, on a reliable test, we feel 
confident that this score is close to his true achievement. Scores 
made on an unreliable test, on the other hand, are subject to large 
errors of measurement and are neither stable nor trustworthy. When 
a test is unreliable, subsequent testings will reveal many discrepan- 
cies between scores achieved by the same persons on different 
occasions. 


1. Methods of determining test reliability 


There are three procedures in common use for determining the reli- 
ability (sometimes called the self-correlation) of a test. These are 
(1) the test-retest (repetition) method; (2) the alternate or parallel 
forms method; and (3) the split-half method. In addition to these 
three, a fourth method—the method of “rational equivalence"—is 
also being widely used. All of these procedures furnish “estimates” 
of the reliability of test scores; sometimes one method and sometimes 
another will give the best estimate, 
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(1) TEST-RETEST (REPETITION) METHOD 


Repetition of a test is the simplest method of determining reliabil- 
ity: the test is given and then repeated on the same group and the 
correlation is calculated between the first and second sets of scores. 
While the test-retest method is sometimes the only feasible proce- 
dure, it is open to various objections. If the test is repeated immedi- 
ately, many subjects will recall their first answers and spend their 
time on new material, thus increasing their scores. Besides the 
memory effect, practice and the confidence induced by familiarity 
with the material will almost certainly affect scores when one takes 
a test for the second time. Transfer effects are likely to be different 
from person to person. If the net effect of transfer is to make for 
closer agreement between scores achieved on the first and second 
giving of a test than would otherwise be the case, the reliability co- 
efficient will be too high. When a sufficient time interval has elapsed 
between the first and second administrations of the test to offset (in 
part, at least) memory, practice, and other effects, the reliability 
coefficient will be a closer estimate of the actual consistency of test 
scores. If the interval between tests is long, however (say, six 
months or so), and the subjects are children, growth or maturity 
changes will affect the retest. 

The test-retest method will estimate less accurately the reliability 
of tests which contain novel features and which are highly suscepti- 
ble to practice than it will the reliability of tests involving routine 
operations little affected by practice. Because of the difficulty in 
controlling the conditions which influence scores on different admin- 
istrations of a test, the test-retest method is used less generally than 
are the other two methods. 


(2) ALTERNATE OR PARALLEL FORMS METHOD 


When alternate or parallel forms of a test have been constructed, 
the correlation between Form A, say, and Form B is taken as a 
measure of the self-correlation of the test. This method is employed 
by the authors of most standard psychological and educational tests, 
for which alternate forms are usually available. 

The alternate forms method is satisfactory if sufficient time has 
intervened between the administration of the two forms to weaken 
or eliminate memory and practice effects. When Form B of a test 
follows Form A very closely, scores on the second test will usu- 
ally be inereased through practice and familiarity. When such in- 
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creases are approximately constant (say, three to five points for 
each score) the reliability coefficient of the test will not be affected, 
since paired A and B scores maintain their same relative positions 
in the two distributions. When the mean increase due to practice 
has been determined, a constant amount can be subtracted from 
Form B scores to make them comparable to Form A scores.” In 
drawing up alternate forms of a test, one should be careful to match 
test materials for content, difficulty, and form; but one must be 
careful not to make the test forms too much alike. If alternate forms 
are practically identical, the reliability coefficient of the test will be 
too high; while if parallel forms are not sufficiently “duplicate” the 
reliability coefficient will be too low. 


(3) THE SPLIT-HALF METHOD 


Tn the split-half method the test is broken into two equivalent 
parts and the correlation of these half tests is computed. From the 
half-test reliability, the self-correlation of the whole test is esti- 
mated by the Spearman-Brown formula described on page 339. 

Тһе split-half method is employed when it is not feasible to con- 
struct an alternate form of the test nor wise to repeat the test. This 
situation occurs with many performance tests, as well as with tests 
and questionnaires dealing with personality traits, attitudes, and 
thelike. A performance test (e.g., picture completion, puzzle solving, 
form board) is often a very different task when repeated, as the child 
is familiar with procedure and content. Likewise, many personality 
tests cannot be given in alternate form nor repeated because of radi- 
cal changes in the subject’s attitude and interests when taking such 
tests for the second time. 

Тһе split-half method is often regarded as the best of the methods 
for determining test reliability, Perhaps its main advantage is that 
all of the data for determining test reliability are obtained upon one 
occasion; hence variations introduced by differences between the two 
testing situations are eliminated. A disadvantage of the split-half 
method is that chance errors may affect the scores on both halves of 
the test in the same way, thus tending to make the reliability coeffi- 
cient too high. Тһе longer the test, the less the probability that 
the effects of temporary and variable disturbances will be cumula- 

* In the Otis Self-Administering Test of Mental Abilities, Higher Examina- 
tion, for instance, the author suggests that when Form B, which is slightly more 
difficult than Form А, is given first, 4 points be added to each score. This is to 
make scores equivalent to the norms for Form B when this test is given after 


Form A, as it usually is. See Manual of Directions, Otis S-A Test (Yonkers: 
World Book Co., 1928), p. 2. 
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tive and in one direction, and the more accurate the estimate of 
reliability. 

Objection has been raised to the split-half method on the ground 
that a test can be divided into two parts in a variety of ways so that 
the reliability coefficient is not a unique value. This criticism is 
strictly true only when items are of equal difficulty. When items are 
placed in order of merit from least to most difficult, the split into odds 
and evens gives a unique determination of the reliability coefficient. 


(4) THE METHOD OF “RATIONAL EQUIVALENCE” 


The method of rational equivalence * represents an attempt to get 
an estimate of the reliability of a test, free from the objections raised 
against the methods outlined above. Туғо forms of a test are defined 
as “equivalent” when corresponding items a, А, b, B, etc., are inter- 
changeable; and when the inter-item correlations are the same for 
both forms. The method of rational equivalence stresses the inter- 
correlations of the items in the test and the correlations of the items 
with the test as a whole. Four formulas for determining test reliabil- 
ity have been derived, of which the one given below is perhaps the 
most useful: 

s 
finc сто) x TOM (77) 
(reliability coefficient of a test in terms of the difficulty 
and the intercorrelations of test items) 


in which: 
ту reliability coefficient of the whole test; 
n = number of items in the test; 
о, = the SD of the test scores; 
p = the proportion of the group answering a test item correctly; 
q = (1— p) =the proportion of the group answering a test item 
incorrectly. 


To apply formula (77) the following steps are necessary: 


Зер | 


Compute the SD of the test scores for the whole group, namely, оу. 


ж Kuder, G. F., and Richardson, M. W., “Тһе Theory of Estimation of Test 


iability,” ika, 1937, 2, 151-160. TW. 
аше а Б 6. F., “Тһе Caleulation of Test Reliability 


Coefficients Based upon the Method of Rational Equivalence," Journal of Edu- 
cational Psychology, 1939, 30, 681-687. 
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Step 2 


Find the proportions passing each item (p) and the proportions 
failing each item (q). 


Step 3 


Multiply p and q for each item and sum for all items. This gives 
Урд. 


Step 4 


Substitute the calculated values in formula (77). 

To illustrate, suppose that a test of sixty items has been adminis- 
tered to a group of eighty-five subjects; о; = 8.50 and Урд = 12.43. 
Applying (77) we have 

60 ., 72.25 — 12.43 
Tu 59 x 7225 842 
which is the realibility coefficient of the test. 

A simple approximation to formula (77) has been devised." This 
formula is useful to teachers and others who want to determine 
quickly the reliability of short objective classroom examinations or 
other tests. It reads: 


пе с еи) (78) 


[approximation to formula (77)] 
in which 
Tu = reliability of the whole test; 
n = number of items in the test; 
6; = SD of the test scores; 
M = the mean of the test scores, 


Formula (78) is a labor saver since only the mean, SD and number 
of items in the test need be known in order to get an estimate of reli- 
ability. The correlation need not be computed between alternate 
forms or between halves of the test. Suppose that an objective test 
of forty multiple-choice items has been administered to a small class 


* Froelich, С. J., “А Simple Index of Test Reliability,” Journal of Educational 
Psychology, 1941, 32, 381-385. 
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of students. An item answered correctly is scored 1, an item an- 
swered incorrectly is scored 0. The mean test score is 25.70 and 
б, = 6.00. What is the reliability coefficient of the test? Substituting 
in (78), we have 

Met 40 X 36.00 — 25.70(40 — 25.70) 
Kam 36.00 X 39 


Тһе assumption is made in formula (78) that all test items have 
the same degree of difficulty, i.e., that the same proportion of subjects 
(but not necessarily the same persons) pass each item. In a power 
test items are never of equal difficulty. Formula (78) will give a 
satisfactory approximation to the test’s reliability, however, even 
when the test items cover a wide range of difficulty. Formula (78) 
always underestimates to a slight degree the reliability of a test as 
found by the split-half technique and the Spearman-Brown for- 
mula, and the more widely items vary in difficulty the greater the 
underestimation. This formula provides a minimum estimate of reli- 
ability—we may feel sure that the test is at least as reliable as we 
have found it to be by (78). 

Formulas (77) and (78) are not strictly comparable to the three 
methods for determining the reliability of test scores given above. 
In a sense, these formulas provide an estimate of the internal con- 
sistency of the test rather than an estimate of the dependability of 
test scores. The method of rational equivalence is superior to the 
split-half technique in certain theoretical aspects, but differences in 
reliability as found by the two methods are never very large (of the 
order .02, etc.). Formula (78) is often to be preferred to the split- 
half method because of the time and calculation it saves rather than 
for other reasons. 


2. Factors influencing the reliability of test scores: chance and constant 
errors 


Many factors affect the reliability of a test besides fluctuations in 
interest and attention, shifts in emotional attitude, and the differen- 
tial effects of memory and practice. To these “psychological” factors 
must be added environmental disturbances such as distractions, 
noises, interruptions, errors in scoring, and the like. All of these vari- 
able influences (environmental and psychological) are subsumed 
under the head “chance errors.” Errors, to be truly “chance,” must 
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influence a score in such a way as to cause it to vary above—as often 
as below—its “true” value. The reliability coefficient is a quantita- 
tive estimate of the importance of chance or variable influences upon 
test scores. 

Constant errors, as distinguished from chance errors, work in only 
one direction. Constant errors may raise or lower all of the scores 
on a retest or on the alternate forms of the test, but will not affect the 
reliability coefficient. If every person taking Form B of a test is 
scored 5 points too high, for example, the self-correlation of the test 
will not be affected (i.e. the correlation between Forms A and B) 
but all of the scores on the second form will be in error by 5 
points. 

How high should the self-correlation of a test be in order for the 
reliability of the test to be considered satisfactory? This is an impor- 
tant question, and its answer depends upon the nature of the test, 
the size and variability of the group tested, and the purpose for 
which the test was given. To distinguish reliably between the means 
of two relatively small groups of narrow range of ability (for exam- 
ple, a fifth grade and a sixth grade) a reliability coefficient need be 
no higher than .50 or .60. If the test is to be used to differentiate 
among the individuals in the group, however, its reliability should be 
90 or more. Most of the authors of intelligence tests and educational 
achievement examinations report correlations of .90 or more between 
alternate forms of their tests. Since the self-correlation of a test is 
directly affected by the variability within the group, in reporting a 
test’s reliability coefficient the standard deviation of the group should 
always be given. 


3. The effect upon reliability of lengthening or repeating a test 


(1) THE RELIABILITY COEFFICIENT FROM MANY APPLICATIONS OR REP- 
ETITIONS OF A GIVEN TEST 


Тһе mean of five determinations of height will, in general, be more 
reliable than a single determination (p. 183), and the mean of ten 
determinations will (in general) be more reliable than the mean of 
five. On the same principle, increasing the length of the test, or 
averaging the results obtained from several applications of the test, 
or from alternate forms, will tend to increase reliability. If the self- 
correlation of a test is not satisfactory what will be the effect of 
doubling or tripling the test’s length? To answer this question ex- 
perimentally would require considerable time and labor. Fortu- 
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nately, a good measure of the effect of lengthening or repeating a test 
may be obtained from the Spearman-Brown *prophecy formula": 

nra 
14 (n— т 
(Spearman-Brown formula for estimating the correlation 
between n forms of a test, and n other similar forms) 


(79) 


Тап = 


in which 

Tan = the correlation between n forms of a test and n alternate forms 
(or the mean of n forms against the mean of т other forms) ; 

ту = the reliability coefficient. 


The subseripts (“11”) show that the correlation is between two forms 
of the same test. 

To illustrate the use of formula (79) suppose that in a group of 
100 adults the self-correlation of a test is .70. What will be the effect 
upon test reliability of tripling the length of the test? Substituting 
ти = .70 and n = in formula (79) and solving for Tnn, we have 


(8X0 _ 2.10 _ gg 
tan = == 4 
1+2xX.70 240 
Tripling the test’s length, therefore, increases its reliability coeffi- 
cient from .70 to .88. Instead of tripling the length of the test we 
could give three parallel forms of the test and average the three 
scores made by each person. The reliability of these mean scores 
(each based upon three measures) will be the same, as far as purely 
statistical factors are concerned, as the reliability got by tripling the 
length of the test. : 

The prophecy formula may also be used to find how many times а 
test should be repeated in order for test scores to reach a given stand- 
ard of reliability. Suppose that the self-correlation of a test is 80. 
How much will the test have to be lengthened or how many times 
repeated, in order to insure 8 reliability coefficient of 1952 Substitut- 
ing rır = .80 and ras = 95 in the formula, and solving for n, we have 

80n _ 80% 
95-12 g0n—.80 20--50п 
апа 
п = 4.15 or 5 in whole numbers 


The test must be five times its present length, therefore, or five alter- 
nate forms must be given and averaged, before the self-correlation 


of the test will reach .95. 
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Predictions of test reliability by the Spearman-Brown formula are 
valid only when the items or questions added to the test cover the 
same ground, are of equal range of difficulty, and are comparable in 
other respects to the items of the original test. When these conditions 
are satisfied, there would appear to be no reason, as far as the math- 
ematieal process is concerned, why we could not boost the self-corre- 
lation of a test to any desired figure, simply by continuing to increase 
its length or by continuing to repeat it. But it is highly improbable 
that the reliability coefficient of a test eould be so increased indefi- 
nitely. In the first place, it is impracticable if not impossible to 
increase a test’s length, say, ten or fifteen times, Furthermore, be- 
yond a certain point, boredom, fatigue, loss of incentive, and the like 
inevitably affect our results and lead to “diminishing returns.” When 
the material added to the test is strictly comparable to the original 
test items, and when motivation remains substantially constant, the 
experimental evidence * indicates that a test may be increased to six 
or seven times its original length, and the Spearman-Brown formula 
will still give a close estimate of empirically determined results. But 
after the first four or five lengthenings the prophecy formula may 
“over-predict”—give higher estimated reliabilities than those ob- 
tained by actual calculation. This is not an especially serious draw- 
back, however, as a test which needs so much lengthening in order 
to yield reliable results should be radically changed in form or con- 
tent, or better still, perhaps, discarded in favor of another test. 

The Spearman-Brown formula may be applied to ratings, judg- 
ments, and other estimates as well as to test items. When measuring 
the reliability of a personality rating scale, for instance, by correlat- 
ing the ratings made by two equally competent judges, we may em- 
ploy the prophecy formula to estimate the increased reliability which 
might be expected if there were four, six or more judges.t 


(2) THE RELIABILITY COEFFICIENT FROM ONE APPLICATION OF A TEST 


When a test has no alternate form and eannot well be repeated, we 
may calculate the reliability of half of the test and then proceed to 


* Holzinger, К. J., and Clayton, B., *Further Experiments in the Application 
of RD Prophecy Formula," Journal of Educational Psychology, 1925, 
Ruch, G. M. Ackerson, Luton, and Jackson, J. D., “An Empirical Study of the 
Spearman-Brown Formula as Applied to Educational Test Material,” Journal 
of Educational Psychology, 1926, 17, 309-313. 
t Remmers, Н. H., Shoc , N. W., and Kelly, E. L., *An Empirical Study of 
* the Validity of the Spearman-Brown Formula as Applied to the Purdue Rating 
Scale," Journal of Educational Psychology, 1927, 18, 187-195. 
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estimate the reliability of the whole test by the Spearman-Brown 
formula. This method is called the “split-half technique" (p. 334). 
The procedure is to make up two sets of scores by combining, say, 
alternate exercises or items in the test. The first set of scores repre- 
sents, for example, performance on the odd-numbered items, 1, 3, 
5, 7, ete.; and the second set of scores performance on the even-num- 
bered items, 2, 4, 6, 8, ete. Other ways of making the two halves of 
the test as comparable as possible in content, difficulty, and suscepti- 
bility to practice may be employed, but the method described is the 
one most commonly used. From the self-correlation of the half test, 
the reliability coefficient of the whole test may be estimated from the 
formula 


АЗ. (80) 


(Spearman-Brown formula for estimating reliability 
from two comparable halves of a test) 

in which 

rır = the reliability coefficient of the whole test; 
7: 1 = the reliability coefficient of one-half of the test, found experi- 

22 mentally. 

When the reliability coefficient of one-half of a test (тул) is .60 it fol- 
lows from formula (80) that the reliability of the whole test (rii) 
is .75. 


4. The index of reliability 


An individual's “true score” on a test (р. 185) is defined as the 
mean of a very large number of determinations made of the given 
person on the same test or parallel forms of the test administered 
under approximately identical conditions. The correlation between 
a series of obtained scores and their corresponding theoretically 
“true” scores may be found by the formula 


Tio = МТ (81) 
(correlation between obtained scores on a given test and 
true scores in the function measured by the test) 
in which 
ти = the reliability coefficient of the given test; 
Ті = the correlation between obtained and true scores. 
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Тһе symbol “о” (infinity) designates “true scores," that is, scores 
obtained from an "infinite" number of administrations of the test 
to the same group. 

The coefficient r1» is called the index of reliability; it measures the 
trustworthiness of test scores by showing how well obtained scores 
agree with their theoretically true counterparts. The index of reli- 
ability gives the maximum correlation which the given test is capa- 
ble of yielding. This follows from the fact that “the highest possible 
correlation which can be obtained (except as chance might occasion- 
ally lead to higher spurious correlation) between a test and a second 
measure is with that which truly represents what the test actually 
measures, that is, the correlation between the test and the true scores 
of individuals in just such tests." * 

To illustrate the application of the index of reliability, suppose 
that for a given test the self-correlation is .64. Then Tis = \/.64 or 
80; and .80 is the highest correlation of which this test is capable, 
since it represents the relationship between obtained test scores and 
true test, scores in the same function. If the self-correlation of a test 
is only .25, so that rj, = \/.25 or .50, it is obviously a waste of time 
to continue using this test without lengthening or otherwise improv- 
ing it. A test whose index of reliability is only .50 is an extremely 
poor estimate of the function which it is trying to measure. 


5. The standard error of an obtained score 


The effects of variable or chance errors in producing divergencies 
of obtained scores from their true counterparts may be estimated 


by the formula 
015 = 01/1 т (82) 
(standard error of an obtained score) 
in which 
01 = the standard error of an obtained score (sometimes called the 
"standard error of measurement") ; 


бі — the standard deviation of the test scores; 
ту = the reliability coefficient of the test. 


The subscript “;..” indicates this standard deviation to be a measure 
of the error made in taking an obtained score (i.e., 1) as an estimate 


* Kelley, T. L., “The Reliability of Test Scores,” Journal of Educational 
Research, 1921, 3, 327. 
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of the true score (i.e. oo). To illustrate the use of 61, suppose that 
in a group of 300 college freshmen the reliability coefficient of an 
aptitude test in mathematics is .92 and the SD of this distribution is 
15.00. From formula (82) we have 


0,4 = 15\/1 — 92 = 42 or 4 in whole numbers 


and the odds are 2:1 that the obtained score made by any individual 
in the group does not differ from his true score by more than +4 
points. If subject AB has a score of 85, we may feel confident (the 
chances are .95) that his score “actually” lies between 77 and 93 
(21.96 X 4.2).* Generalizing for the entire group, we should expect 
about two-thirds of the 300 scores to be in error by 4 points or 
less; the other one-third (or 100) to be in error by more than 4 
points. 

The reader should note carefully the difference between oest.) (see 
р. 162) and d1». The first formula enables us to say with what degree 
of assurance we can predict an individual's score on one test when we 
know his score on a second (and usually a different) test. The actual 
prediction of the most probable score is made, of course, by way of 
the regression equation connecting the two variables (p. 159). The 
SE of an obtained score, біш; is also an estimate formula; it tells us 
how adequately an obtained score represents the true score. Although 
the true score is unknown, we can, nevertheless, tell from бі, how 
much our obtained score probably misses the true value. The SE of 
an obtained score is the best method of expressing the reliability ofa 
test, since it takes account of the self-correlation of the test as well 
as of the variability within the group. 

Formula (82) provides a general estimate of the SE of any score 
over the entire range of the test. When the range is wide, the agree- 
ment of scores on two forms of the test may differ considerably at 
successive parts of the scale. To refine our estimate of the reliability 
of our test scores, we may compute 01. for different levels of achieve- 
ment. This has been done for the new Stanford-Binet; the о for 
L.Q.'s 130 and above, for example, is 5.24, for 1.0: 90-109, 4.51, for 
1.Q.’s 70 and below, 2.21, ete. Тһе method is described in the refer- 


ences given below. 


*5 187. A 
t Ter en 18 M., and Merrill, М. A., Measuring Intelligence (Boston: Hough- 


rm. 
t iffli ., 1937), p. 46. 2 AS 
аа Do rye Expected Аата Difference between Individuals 


Paired at Random," Journal of Genetic Psyci ology, 1933, 43, 438-439. 
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6. The dependence of the reliability coefficient upon the range and vari- 
ability of the group 


Тһе reliability coefficient of a test administered to a group of small 
range (a single grade, say), cannot be compared directly with the 
reliability coefficient of the same test administered to a group of 
greater range, e.g., to the children in several grades. The self-corre- 
lation of a test (like any correlation coefficient) is affected by the 
variability of the group; and the larger and more heterogeneous the 
group, the greater test variability tends to be. If we know the self- 
correlation of a test in a narrow range we can estimate the self-corre- 
lation of the same test in an inereased range (ordinarily a larger 
group) by the formula 


(83) 


(relation between 678 and reliability coefficients obtained in different 
ranges when the test is equally effective throughout both ranges) 


in which 
с, and б, = the o's of the test scores in the small and large ranges, 


respectively ; 
Tss and ry = the reliability coefficients in the small and large ranges. 


То illustrate the use of formula (83) suppose that for a single fifth 
grade, r,, = .50, and о, = 5.00; and that for а larger group made up 
of children from grades three to seven, o; — 15.00. Assuming our test 
to be as effective in the wide range as in the narrow, what is the reli- 
ability coefficient of the test in the wide range? If we substitute for 
Gs, бі and т, in formula (83) ти = .94. This means that a reliability 
coefficient of .50 in the narrow range indicates as high a degree of test 
consistency as a reliability coefficient of .94 in a group in which the 
range is three times as wide. 


Il. The Validity of Test Scores 


The validity of a test, or of any measuring instrument, depends 
upon the fidelity with which % measures whatever it purports to 
measure. А homemade yardstick is valid when measurements made 


——— 
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by it are proved to be accurate by standard measuring rods. And in 
the same way a test is valid when the capacity which it gauges cor- 
responds to the same capacity as otherwise objectively measured and 
defined. The difference between validity and reliability can be made 
clear, perhaps, by an illustration. Suppose a clock is set forward 
twenty minutes. If the clock is a good timepiece, the time it “tells” 
will be reliable (i.e., consistent) , but it will not be valid as judged by 
“standard time.” The reliability of the measurements made by 
scales, thermometers, yardsticks, chronoscopes, clocks, etc., is deter- 
mined by making repeated measurements of the same facts; and 
validity is determined by comparing the measures returned by the 
given instrument with highly precise (if arbitrary) standard meas- 
ures, The reliability of mental measures is found in the same way. 
But since precise and independent standards (criteria) are rarely 
found in mental measurement, the validity of a test can never be 
estimated as precisely as can the validity of a thermometer or a 
rheostat. 


|. The determination of validity through correlation with a criterion 


The validity of a test is determined directly, whenever possible, by 
finding the correlation between the test and some independent eri- 
terion. A criterion is an objective measure in terms of which the 
value of the test is estimated or judged. The criteria for evaluating 
a general intelligence examination, for example, may be school 
marks, ratings for aptitude in learning, or some other test believed to 
be valid, such as Stanford-Binet. A trade test may be validated 
against demonstrated ability to carry on the required operations as 
shown in actual performance.* A high correlation between a test and 
a criterion is evidence of validity provided the test and the criterion 
are both reliable. But before accepting criterion correlations, we 
must know the reliability of the test and if possible the reliability of 
the criterion. 

When a criterion is not immediately available, indirect methods 
may be utilized for estimating the validity of a test. We may, for 
example, compute the average correlation which each test in a bat- 
tery shows with all of the other tests, and estimate the validity (ie, 
the representativeness) of each test by the size of its correlations. 
Again, following essentially the same method, we may combine the 


*Stead, W. H., and Shartle, C. L., Occupational Counseling Techniques, 
op. cit., Chapters 5 and 8 especially. 
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Scores on a number of tests designed to measure the same function 
(memory, say), and consider as most valid that test which correlates 
highest with the average of them all. Anastasi,* for example, found 
that of eight tests of immediate memory, the paired-associates test 
(geometric form paired against numbers) had the largest average 
correlation (i.e., 49), with the other tests of the battery. This test, 
then, is the most valid measure of the function tapped in common by 
all of the tests. 


2. The correction for attenuation 


Тһе correlation between a test and its eriterion will be reduced if 
either the test scores or the criterion scores or both are unreliable. 
Tn order to estimate the correlation between true scores in two vari- 
ables, we need to make a correction which will take account of the 
unreliability in both sets of measures. Such а correction is given by 
the formula 


Ті 
Таз = —————— (84) 
Viri X тэп 


(correlation between true measures in Tests 1 and 2) 
in which 
Тех = correlation between true scores in Tests 1 and 2; 
2 = correlation between obtained scores in Tests 1 and 2; 
711 — reliability coefficient of Test 1; 
Тап = reliability coefficient of Test 2. 


Formula (84) is the well-known correction for attenuation for- 
mula. It provides a correction for the effects of those chance or acci- 
dental errors in the two tests which lower the reliability coefficients 
of both tests and thus affect the correlation between them. To illus- 
trate the application of formula (84), let the obtained correlation 
between two tests A and B be 60, the reliability coefficient of Test A 
be .80 (rij) and the reliability coefficient of Test B be .90 (ғап). 
What is the correlation between Tests A and B freed of chance 
errors? Substituting the given values in formula (84), we have 


Tow = 


У/80 X .90 
as the estimated correlation between true scores in A and B. Our 


* Anastasi, A., “А Group Factor in Immediate Memory,” Archives of Psychol- 
ogy, 1930, 120, p. 41. 
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corrected coefficient of correlation represents the relationship which 
we should expect to obtain if our two sets of test scores were perfect 
measurements. 

Tt is clear from formula (84) that correcting for chance errors will 
always raise the correlation between two tests—unless the reliability 
coefficients are both 1.00. Chance errors, therefore, always lower 
or attenuate an obtained correlation coefficient. The expression 
Ути X Топ sets an upper limit to the correlation which we can obtain 
between two tests as they stand. In the example above, v.80 x .90 
= 85; hence, Tests A and В cannot correlate higher than .85, as 
otherwise their corrected r would be greater than 1.00. 

Let us assume the correlation between first-year college grades and 
а general intelligence test to be .46; the reliability of the intelligence 
test to be .82; and the reliability of college grades to be .70. The 
maximum correlation which we could hope to obtain between these 
two measures is 220220 ог .60. Knowing that the correlation 


МЛ0 X .82 
between grades and general intelligence, corrected for errors of meas- 


urement, has a probable maximum value of .60 gives us a better 
notion of the "intrinsic" relationship between the two variables. At 
the same time, the investigator should remember that the rire of .60 is 
a theoretical, not an obtained, value; that it gives an estimate of the 
relationship to be expected when the tests are more effective than 
they actually were in the present instance. If many sources of error 
are present so that considerable correction is necessary, it would be 
better experimental technique to improve the tests and the experi- 
mental conditions than to correct the obtained r. 

The investigator must be careful how he applies formula (84) to 
correlations which have been averaged, as in such cases the reliability 
coefficients may be lower than the correlations between the two tests. 
When this happens rz; is greater than 1.00. Such a result is logically 
and psychologically meaningless. If a corrected r is 1.00, or is only 
slightly greater than 1.00, however, it may be taken as indicating 
complete agreement between the two variables within the error of 
computation. 


3. The estimation of the true c of a test 


Chance or variable errors have a marked effect upon the standard 
deviation of a test, as well as upon the r between tests. The relation 
of the c calculated from obtained scores on a test to the c of true 
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scores on the same test is given by the formula 


865 = O1VTir (85) 
(relation between true and obtained o’s for a set of test scores) 
in which 
бы = the с of the true test всогев; 


бі = the c of the obtained test scores; 
711 = the reliability coefficient of the test. 


Suppose an educational achievement test of seventy-five items 
has been administered to a group of fifty children. The obtained 
standard deviation, оу, is 10, and the reliability coefficient of the test 
(711) is 50. What is б», the в of the true scores from which variable 
or accidental errors have been eliminated? Substituting e, = 10, and 
fır = .50 in formula (85) 


9, = 103/50 


-71 


and the “true о” of the test is about 7 points. 

It is clear from (85) that o, will always be smaller than бі, except 
in the improbable сазе in which rir = 1.00. The effect of chance 
errors of measurement, then, is always to increase the spread (oi) of 
obtained test scores or of eriterion scores, 


4. Validation of a test battery * 


A criterion of job efficiency, say, or of success in salesmanship may 
be forecast by a battery consisting of four, five, ог more tests. The- 
validity of such а battery is determined by the multiple correlation 
coefficient, R, between the battery and the criterion. The weights to 
be attached to scores on the sub-tests of the battery are given directly 
by the regression coefficients (р. 393). 

If the regression weights are small fractions (as they often are) 
whole numbers may be substituted for them with little if any loss in 
accuracy. For example, Suppose that the regression equation join- 
ing the criterion and the tests in a battery reads as follows: 


C (criterion) = 4.32Х, -Г.812Х,- -65X3 + 8.35X, + K (constant) 
-Dropping fractions and taking the nearest whole numbers, we have 


* Gulliksen, H., Theory of Mental Tests (New York: John Wiley and Sons, 
1950), Chapter 20 especially. 
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Scores in Test 1 should be multiplied by 4, scores in Test 2 by 3, 
scores in Test 3 by —1, and scores in Test 4 by 8, in order to pro- 
vide the best forecast of C, the criterion. The fact that Test 3 has a 
negative weight does not mean that this test has no value in forecast- 
ing C, but simply that the best estimate of C is obtained by giving 
scores in Test 3 a negative value. 


111. Item Analysis 


In Section II above, we considered the validity of final test scores. 
Тһе validity of a test score also depends directly upon the care with 
which the items in the test have been chosen. While the subject of 
item analysis properly belongs in à book on test construction, the 
main features of the process may be outlined here. Item analysis 
may be divided into three main topies: (1) item selection, (2) item 
difficulty, and (3) item validity. 


l. Item selection 


Тһе initial choice of test items depends upon the judgment of com- 
petent persons as to the suitability of the material for the purposes of 
the test. Certain types of items, for instance, have proved to be gen- 
erally useful in intelligence examinations. Problems in mental arith- 
metic, for example, vocabulary, analogies and number series comple- 
tion, are often encountered; also, items requiring generalization, 
interpretation and the ability to see relations. The validity of most 
standard tests of educational achievement depends upon the consen- 
sus of teachers and other competent judges as to the adequacy of the 
items included. Courses of study, requirements for different grades, 
curricula from different sections of the country are carefully culled 
over by the test makers to determine what material in history, Eng- 
lish, geography, ete., should be included in an educational achieve- 
ment battery designed, say, for the seventh grade. In its final form 
the educational achievement test represents items carefully selected 
from all available sources of information. 

Items used in personal data sheets, interest inventories, attitude 
scales and the like, also represent a consensus of experts as to the 
most diagnostic items in the areas sampled. 
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The difficulty of an item is determined by the proportion of some 
Standard group able to solve the item correctly. The scaling of sep- 
arate test items has been described in Chapter 12, page 301. When 
normality of distribution can be assumed for the ability being meas- 
ured, single items or groups of items (scores) may be sealed, i.e., 
given difficulty values along a scale in terms of о. It has been cus- 
tomary to select items for a test which vary in difficulty from easy to 
hard. Тһе average person in the standardization group will then pass 
about one-half (5096) of the items in the test. It ean be shown, how- 
ever, that the sharpest discrimination as between good and poor sub- 
jects is provided by items which are passed by 5096 of the members 
of a group. A test made up of items all of which аге passed by ap- 
proximately 50% (but by different persons, of course) would theo- 
retically be the most discriminating test. But it would be difficult to 
construct such an examination and it is probable that a test made up 
of items covering a wider range of difficulty is psychologically a 
better measuring device. In standardizing a test care much be taken 
that few, if any, subjects achieve perfect or zero scores, as in neither 
case is the person measured by the test. 


3. Item validity 


Ап often-used method of validating a test item is to determine 
whether the item discriminates between subjects differing sharply in 
the function being measured. This “criterion of internal consistency” 
admits into the final test or questionnaire only those items which 
have been found to separate high-scoring and low-scoring members 
of the group. In an internally consistent test, items “hang together” 
in the sense that they work in the same direction and measure the 
same common trait." In one study, eighty-six items were selected 
out of 222 on the basis of their ability to discriminate among the 
lower, middle, and upper thirds of the group. These eighty-six 
“good” items did a better job (higher reliability and validity) than a 
test two and a half times as long. 

The validity of a single test item may also be determined by find- 


* Ferguson, С. А., “The Factorial Interpretation of Test Difficulty," Psycho- 
metrika, 1941, 6, 323-329. 

T Anderson, J. E., “Тһе Effect of Item Analysis upon the Discriminative 
Power of an Examination," Journal of Applied Psychology, 1935, 19, 237-244. 
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ing its correlation with total scores in the test of which it is a part, or 
by finding its correlation with scores in some independent criterion. 
The bi-serial method (p. 356) is the standard procedure for determin- 
ing item validity through correlation. Application of bi-serial r to 
each item in a test requires considerable computation, however. For 
this reason various short-cut methods for selecting good items by for- 
mula and by graphical methods have been devised. References given 
below should be consulted.* 


PROBLEMS 


1. The reliability coefficient of a test is .60. 

(a) How much must this test be lengthened in order to raise the self- 
correlation to .90? 

(b) What effect will doubling the test’s length have upon its reliability 
coefficient? tripling the test's length? 

А test of fifty items has a reliability coefficient of .78. What is the reli- 

ability coefficient 

(a) of a test having 100 items comparable to the items in the given test? 

(b) of a test having 125 comparable items? 


to 


8. A given test has a reliability coefficient of .80 and a с of 20. 


(a) What is the maximum correlation which this test is capable of yield- 
ing as it stands? 

(5) What is the standard error of a score obtained on this test? 

(c) What is the estimated reliability coefficient of this test in a group 
in which the o is 15? 


4. A test of 100 items is given to a group of 225 subjects with the following 
results: М = 62.50; о = 9.62. 


(a) What is the reliability coefficient of the test by formula (78)? 
(b) What is the estimated true о of this test? 
(c) What is the standard error of a score on this test? 


* Long, John A., and Sandiford, Peter, The Validation of Test Items, Bul- 
letin 3, 1935, University of Toronto, Department of Educational Research. 
Flanagan, J. C., *General Considerations in the Selection of Test Items," 
Journal of Educational Psychology, 1939, 30, 674-680. Е 
Aes J. P., “The Phi-coefficient and Chi-square as Indices of Item Val- 
idity,” Psychometrika, 1941, 6, 11-19. F ) 
БАда, M. W., and Adkins, D. С., “А Rapid Method of Selecting Test 
Items," Journal of Educational Psychology, 1928, 29, 547-552. 5 
, Hawkes, Н. E., Lindquist, E. R., and Mann, С. R., Achievement Examina- 
tions (Boston: Houghton Mifflin Co., 1936), Chapters 2 and 3, especially. 
Gulliksen, H., Theory of Mental Tests, op. cit., Chapter 21. Я 
. Davis, F, B., Item-Analysis Data: their computation, interpretation, and use 
їп test construction, Cambridge, Mass.: Harvard Educ. Papers, #2, 1946. 
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5. 


Show (a) that when the reliability coefficient is zero, the standard error 
of an obtained score equals the standard deviation of the test ; and (5) 
that when the reliability coefficient is 1.00, the standard error of an 
obtained score equals zero. 


A mathematics test has a reliability coefficient of 82, and a mechanical 

ability test has a reliability coefficient of .76. The r between the two | 

tests is .52. 

(a) What would the correlation be if both tests were perfect measures? 

(b) What is the maximum correlation possible with the mathematics 
test as it stands? 

(c) What is the maximum correlation possible with the mechanical abil- 
ity test as it stands? 


Ап intelligence examination shows a correlation of .50 with first-year 
scholarship. The reliability coefficient of the test is .85, and of school 
grades (i.e., the criterion) is 65. What is the highest validity coefficient 
which we can hope to get with this test (ie., corrected correlation be- 
tween test and grades) ? 


A test of seventy-five items has a 9, of 12.35. Тһе Ура = 16.46. What 
is the reliability coefficient by formula (77)? 


ANSWERS 


(a) six times 
(b) my = 75 (doubling length) ; Ty = 82 (tripling length) 
(a) 88 
(b) 90 
(а) 89 
(5) 89 
(с) .64 
(а) 75 
(b) 8.34 
(с) 481 
(а) 66 
(b) 91 
(с) 87 
68 


90 


14 


FURTHER METHODS OF CORRELATION 


+ 


In Chapter 6 we described the linear, or product-moment correla- 
tion method, and in Chapter 7 showed how, by means of r and the 
regression equations, one can "predict" or "forecast" values of one 
variable from a knowledge of the other. Test scores, as we have seen, 
represent a series of determinations of a continuous variable taken 
along a numerical scale. The correlation coefficient is valuable to 
psychology and education as a measure of the relationship between 
test scores and other measures of performance. But many situations 
arise in which the investigator does not have scores and must work 
with data in which differences in a given attribute can be expressed 
only by ranks (e.g., in orders of merit); or by classifying an indi- 
vidual into one of several descriptive eategories. This is especially 
true in voeational and applied psychology and in the field of person- 
ality and character measurement. Again, there are problems in which 
the relationship among the measurements made is non-linear, and 
cannot be described by the product-moment т. In all of these cases 
other methods of determining correlation must be employed; and the 
purpose of this chapter is to develop some of the more useful of these 
techniques. 


І. Computing Correlation from Ranks 


Differences among individuals in many traits can often be expressed 
by ranking the subjects in one-two-three order when such differences 
cannot be measured directly. For example, persons may be ranked in 
order of merit for honesty, athletic ability, salesmanship, or social 
adjustment when it is impossible to measure these complex behaviors. 
In like manner, various products or specimens, such as advertise- 
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ments, color combinations, handwriting, compositions, jokes, and 
pictures, which are admittedly hard to measure, may be put in order 
of merit for esthetic quality, beauty, humor, or some other character- 
istic. In computing the correlation between two series of ranks, spe- 
cial methods which take account of relative position have been 
devised. These methods may also be applied to scores which have 
been arranged in order of merit. When we have only a few scores 
(less than 25, say), it is often advisable to rank these scores in order 
of merit and compute the correlation by the rank-difference method 
instead of by the longer and more laborious product-moment method, 
Coefficients of correlation calculated from a few cases are not very 
reliable at best, and their chief value lies in suggesting the possible 
existence of relationship—as in a preliminary survey. In such situa- 
tions the rank-difference method will give as adequate a result as that 
obtained by a more refined technique, and is much easier to apply. 


1. Calculation of о (rho) from rank-differences 


The rank-difference method is illustrated in Table 46. The problem 
is to find the relationship between the length of service and the sell- 
ing-efficiency of twelve salesmen. The names of the men (A, B, C, 
etc.) are listed in column (1) of the table, and in column (2), oppo- 
site the name of each man, is given the number of years he has been 
in the service of the company. In column (3), the men are ranked in 
order of merit in aecordance with their length of service. For exam- 
ple, G, who has been longest with the company, is ranked 1; C, whose 
length of service is next longest, is ranked 2; and so on down the list. 
Note that both A and J have the same period of service, and that each 
is ranked 7.5. Instead of ranking the first man 7 and the second 
man 8, or both 7 or both 8, we compromise by ranking both 7.5 and 
F, who follows, 9.* 

In column (4) the men have been ranked by the sales manager in 
order of merit for efficiency as salesmen: C, the most efficient man, is 
ranked 1; and B, the least efficient, is ranked 12. In column (5) the 
difference (designated D) between each man’s efficiency rank and his 
years-of-service rank is entered; and in the last column each of these 
D's has been squared. Since each D is squared in column (6), no 
account need be taken of + and — signs in column (5). The correla- 

* If three men receive the same rank, eg., 7, 8, 9, each is ranked 8 and next 


man in order is ranked 10. If four men receive the same rank, e.g., 7, 8, 9, and 
10, each is ranked 8.5 and the next in order 11. 
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TABLE 46 To illustrate the rank-difference method of measuring corre- 


lation 
(1) (2) (8) (4) (5) (6) 
Differen 
Order of Order of Difference 
Salesmen peel егі Meri: between и: 
(Service) (Efficiency) (D) (D?) 

A 5 6 1.5 2.25 

B 2 11.5 12 5 25 

с 10 1 10 1.00 

р 8 4 9 5.0 25.00 

Е 6 6 8 2.0 4.00 

Е 4 9 5 40 16.00 

а 12 1 2 1.0 1.00 

н 2 11.5 10 1.5 2.25 

1 1 5 3 2.0 4.00 

J 5 7.5 7 5 25 

K 9 3 4 1.0 1.00 

L 3 10 1l 10 2100 
Чуя 6D? 6X 58 542 

='1 = оксалатын = сал = 86 
*=1—у0в—-у" 1 — 10040)" 80 ui 


tion between the two orders of merit may be computed by substitut- 
ing for XD? and N in the formula 
62D? 
=1— ———— 86 
: N(N?—1) en 


(rank correlation coefficient, o) 


in which D represents the difference in rank of an individual in the 
two series; XD? is the sum of the squares of all such differences; and 
N is the number of cases. Substituting 58 for the XD? and 12 for N 
in formula (86), we obtain a о of .80. The symbol o (read as rho) is 
the rank order coefficient of correlation. o may be transmuted into а 
product-moment r by means of tables, but the difference between o 
and its equivalent r is so small that with little loss of accuracy o 
may be taken as approximately equal to r. 


2. The significance of o (rho) 


Since o is at best only an approximate measure of the relation- 
ship indieated by r, it is hardly worth while computing its SE. Per- 
haps the best way of estimating the reliability of o if it is wanted is 
to test the obtained value of о against the null hypothesis by means 
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of Table 25, p. 200. Thus for the problem in Table 46 we find for 
(N — 2) or 10 df that an r must be .71 to be significant at the .01 
level. Our computed o of .80 is considerably larger than .71 and 
hence is statistically significant though based upon only 12 ranks. 


3. Summary on rank-difference correlation 


The product-moment method deals with the size of the score as 
well as its position in the series, Rank-differences, on the other hand, 
take account only of the positions of the items in the series, making 
no allowance for the size of the gaps between adjacent scores. Indi- 
viduals, for example, who score 90, 89, and 70 on a given test are 
ranked 1, 2, 3 in order of merit, although the difference between 90 
and 89 is 1, and the difference between 89 and 70 is 19. Considerable 
accuracy may be lost in translating scores over into ranks, as gaps 
will appear in the rankings when a number of scores, all of the same 
size, receive the same rating. The rank-difference coefficient is rarely 
used with test scores when N is larger than 30 and is often an explor- 
atory and preliminary device. 


Il. Measuring Correlation from Data Grouped 
into Categories 


l. Bi-serial correlation 


In many problems it becomes important to calculate the correla- 
tion between traits or attributes, when the members of the group can 
be measured (i.e., given scores) in the one variable, but can only be 
classified into two categories in the second or “dichotomous” varia- 
ble. (The term dichotomous means “cut into two parts.”) We шау, 
for instance, wish to know the correlation between MA and “social 
adjustment” in a group of nursery-school children, when our subjects 
have been given scores in the first trait, but are simply classified as 
“socially adjusted” or “not socially adjusted” in the second trait. 
Other examples of dichotomous classification with reference to some 
attribute are athletic-nonathletic, radical-conservative, socially 
minded-mechanically minded, literate-illiterate, above eighth grade 
in school-below eighth grade, and the like. The correlation between a 
set of scores and two-category classifications like those listed cannot 
readily be found by the ordinary product-moment r or by the rank- 
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difference formula. We сап, however, compute a bi-serial coefficient 
of correlation if we may assume that the trait in which we have made 
a two-way split would be continuous and normally distributed if 
more information were available. 

Many test and question items are scored to give two responses: 
for example, problems marked Passed or Failed, statements True or 
False, personality inventory items Yes or No, interest items Like or 
Dislike, and so on. When a two-category split cannot be regarded 
ав representing a normal distribution but is in fact two separate 
groupings, the point bi-serial r provides a useful measure of relation. 


(1) CALCULATION OF BI-SERIAL 7 


The calculation of bi-serial r is illustrated in Table 47. The prob- 
lem is to find the correlation between total scores on a test and the 
answers to a single item in the test (Item 72); or put differently, to 
find whether those who make high scores on the test tend to answer 
Item 72 “Yes” more often than “No.” The first column of Table 47 
gives the class-intervals of the score distribution. Column two gives 


TABLE 47 To illustrate the calculation of the bi-serial r between total 
scores on a test and the answers to a single item on the test 


Scores Responses to 
оп Test “Yes” “No” f Ме 5806; RU of all scores 
(№ = 100 
3 3 с = 11.63; с of all scores (N = 100) 
75-79 4 2 6 M, = 60.08; mean of “ Yes" responses 
70-74 6 2 8 (N = 60) 
65-69 5 5 10 М, = 55.00; mean of “Хо” responses 
60-64 10 9 19 (N = 40) 
peat n А 20 р = .60; proportion answering “Yes” 
45-49 4 0% to Item 72 NOT 
40-44 3 2 5 q = .40; proportion answering “Хо 
35-39 4 4 to Item 72 
30-34 2 2 z = .386; height of ordinate separat- 
25-29 1 1 ing 60% from 40% in a пог- 
60 20 100 mal distribution (Table 48) 
(p) @ (Ун hes ) 
= M, pa 2 X (88) 
отн A D UR Бап: 
60.08 — 55.00 _, (.60)(.40) VA 
телу S (62 = (эту) 
ey У100 
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the distribution of scores made by the sixty subjects who answered 
“Yes” to Item 72, and column three the distribution of scores made 
by the forty subjects who answered “No.” The sum of all of the fre- 
quencies on the score-intervals gives the total distribution of 100 
cases (column four). The steps in calculating bi-serial r from here 
on are as follows: 


Step | 


Calculate M,, the mean of the scores made by the sixty subjects 
who answered “Yes” to Item 72. Also calculate M. a the mean of the 
scores made by the forty subjects who answered “No” to Item 72. In 
our problem, M, = 60.08, and M, = 55.00. i 


Step 2 


Calculate the с of the whole distribution—the distribution of the 
100 scores. This c, which equals 11.63, gives the spread of the test 
scores in the entire group. 


Step 3 


Sixty percent of the group (p) answered “Yes” to Item 72, and 
4076 (q) answered *No" (p always equals 1 — q). Assuming a nor- 
mal distribution of opinion on this item (varying from complete 
agreement on through indifference to complete disagreement) upon 
which a dichotomous division has been forced, we place the dividing 
line between the “Yes” and “No” groups at a distance of 10% from 
the middle of the curve, as shown in the figure below. | 
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From Table 48, the height of the ordinate (i.e., 2) which is 10% from 
the mean of a normal distribution is .386. 


Step 4 ` 


Having computed М,, Му, o, p, 4, and z, we find ry, from the for- 
mula 


= М-М, 


5 tx (87) 


This 


(bi-serial coefficient of correlation or bi-serial r) 
in which, as illustrated by the problem above, and shown in Table 47 


M, = mean of the group in the first category (usually the group 
showing superior or more desirable characteristics) 
M, = mean of the group in the second category 
о = standard deviation of the entire group 
p = proportion of the whole group in category опе 
4 = proportion of the whole group in category two (p = 1 — q) 
2 = height of the ordinate in the normal curve dividing p from 9 


In Table 47, ғы is .27, indicating a tendency, though not a strong 
one, for “Yes” answers to Item 72 to accompany high total scores. 
(2) тне SE or BI-SERIAL r 


Provided neither p nor q is very small (e.g., smaller than .05), an 
approximate formula for the standard error of bi-serial r is 


ура bis 
Mc 


(SE of гы, for values of p and q greater than .05) 


(88) 


À comparison of formula (88) with the classical SE, formula for a 
product-moment r (see p. 197) shows that SZ,,,, is somewhat larger 
than SE, and becomes increasingly larger as the difference between 
p and д widens: from p = .50, q = .50 to p = .95, q = .05, say. In the 
problem of Table 47, тъ, = .27 and SE,,,, = 12. To test the reliabil- 
ity of this ry, in terms of its SE, we must assume that the sampling 
distribution of r is normal, put the population r at the center of the 
distribution (Fig. 46, p. 355), and take SE, to be the SD of the sam- 
pling distribution of 78, When we do this, the .95 confidence-interval 
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for the true тъ, is from .03 to .51 (ғы + 1.96 X SEA, or .27 + 24). 
This wide range shows that ry, is probably indicative of some degree 
of positive correlation (the lower limit of the confidence-interval is 
08), but it is impossible to say accurately just how much. 


TABLE 48 Deviates (x/o) in terms of c-units and ordinates (z) for given 
areas measured from the mean of a normal distribution 
whose total area — 1.00 


[z/o = x] 
dra fom x or (z/o) 2 рош x or (2/0, z 
.00 000 399 26 706 81 
01 025 399 27 739 304 
02 050 398 28 772 296 
03 075 398 29 806 288 
04 100 397 30 842 280 
05 126 396 31 878 271 
6 151 894 32 915 262 
07 176 393 33 954 258 
08 202 391 34 995 248 
09 228 389 35 1.036 233 
10 253 386 36 1.080 223 
11 279 384 37 1.126 212 
12 305 381 38 1.175 200 
13 332 378 39 1227 188 
14 358 374 40 1282 176 
15 385 370 А1 1341 162 
16 412 366 42 1.405 149 
17 440 362 43 “ 1476 134 
18 468 358 44 1.555 119 
19 496 353 45 1.645 103 
20 524 348 46 1.751 086 
21 553 342 47 1.881 068 
22 583 337 48 2.054 048 
23 613 331 49 2.326 027 
24 643 324 50 © 000 


25 675 318 


Se == + MEL P _ Ж еш 


(8) AN ALTERNATIVE FORMULA FOR BI-SERIAL 7 


There is another—and slightly different —formulà for bi-serial r 
which is often useful. This is 


М,- Mr р 
S XP (89) 


(bi-serial coefficient of correlation or bi-serial т in terms of Мт, the 
mean of the total group) 


Тыв = 


in which 
M, — mean of the group in the first (or p) eategory 
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Мт — mean of entire group 
с = standard deviation of entire group 
p — proportion of whole group in category one 
2 = height of ordinate in normal curve dividing p from 4 


Substituting in formula (89) the values for М,, Mr, 6, p, and 2, 
shown in Table 47, we have 


— 6008 — 58.05... 600 
11.63 386 


which checks our previous result. 

Formula (89) is especially well suited to those problems in which 
sub-groups having different characteristics are drawn from a larger 
group, the larger group mean (Му) remaining the same. 

The bi-serial correlation method has frequently been used in deter- 
mining item validity,* that is, in finding whether success or failure 
upon a given item is correlated with total score in the test or with 
score in some criterion (Table 47). If those who achieve high scores 
in the criterion get an item right more often than those who make 
low scores, the item will be positively correlated with the criterion. 
Such an item is a good measure of the criterion while one which cor- 
relates zero or negatively with criterion scores is a poor measure. 


= 27 


bis 


(4) THE POINT BI-SERIAL COEFFICIENT 


When items are scored 1 if correct and 0 if incorrect, that is, as 
either-or, the assumption of normality in the distribution of right- 
Wrong responses is unwarranted.t In such cases the point bi-serial т 
rather than bi-serial r should be used. The point bi-serial method 
assumes that the behavior which has been classified into two cate- 
gories can be thought of as occurring at two distinct points or modes 
instead of along a graduated scale or continuum. Point bi-serial = 
has proved to be useful in item analysis. The formula is 


WE M, Ма уд (90) 


(point bi-serial coefficient of correlation) 


While (87) is often used in item analysis, (90) is somewhat more 
defensible and is easier to apply. Point bi-serial r's are lower than 


* Long, J. A., and Sandiford, Peter, The Validation of Test Items, Depart- 
ment of Edueational Research, University of Toronto, 1935, Bulletin #3, 16-17. 
‚1 Richardson, M. W., and Stalnaker, J. L., “A Note on the Use of Bi-serial 7 
in Test Research," Journal of Genetic Psychology, 1933, 8, 463-465. 


362 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


bi-serial r's and are not directly comparable either to 7,78 or to prod- 
uct-moment r’s. For example, the validity index of Item 72 (Table 
47) by formula (90) is 21 as compared with the ry, of .27. 


2. Tetrachoric correlation 


We have seen in the last section that when one variable is continu- 
ous and is expressed in the form of test scores, and the other is 
diehotomous or in a twofold classification, bi-serial r provides а 
measure of relationship between the two,- An extension of the prob- 
lem of finding correlation between categories to which bi-serial тіз. 
not applicable presents itself when both variables are dichotomous. 
We then have a 2 X 2 or fourfold table, from which a modified form 
of the product-moment coefficient, called tetrachoric т, тау be calcu. 
lated. Tetrachorie r is useful when one wishes to find the relation- 
ship between two characters or attributes neither of which is directly 
measurable, but both of which are capable of being separated intc 
two categories. Thus, if we wish to measure the correlation between 
school attendance and employment, persons might be classified into 
those who have attended high school and those who have not ; and 
into those who are employed and those who are unemployed. Or, if 
we wish to discover the correlation between intelligence and social 
maturity, children might be classified as above average and below 
average in intelligence, on the one hand, and as socially mature and 
socially immature on the other. Tetrachorie correlation assumes that 
the two variables being studied are essentially continuous, and would 
be normally distributed if it were possible to classify them more 
exactly into finer groupings. 


(1) CALCULATION oF TETRACHORIC 7 


Table 49 illustrates a 22 fold table, and shows the steps in- 
volved in calculating tetrachoric т. The problem is to find whether a 
larger number of successful than of unsuccessful salesmen tend to be 
“socially well adjusted.” The data are artificial. The X-variable 
(along the top of the diagram) is divided into two categories “suc- 
cessful” and “unsuccessful”; and the Y-variable (along the left of 
the diagram) is divided into two categories “socially well adjusted” 
and "socially poorly adjusted.” The sums of the rows show that 
sixty salesmen (a+b) out of the sample of 100 are classed as well 
adjusted socially, and that forty salesmen (c--d) are classed as 
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TABLE 49 To illustrate the calculation of tetrachoric r (ғ) 
(The data are hypothetical) 


X-variable 
100 Salesmen 
Unsuccessful | Successful 
3 | Socially Well 25 35 
a Adjusted (b) (a) 
5 | Socially Poorly 30 10 
ы Adjusted (d) (c) 
55 
q = 55% p' = 45% 
For p = .60, q = .40, а = 10 For p’ = .45, 
x = — 253 [ Table 48 1 х = En 
z= .386L Fig.58 2 = .396 
а-ы хп (91) 
ШЕ? 2 


1050 - 250 _ ATs (- .253)(.126)r 
100*(.386) (.396) 2 
.528 = r — .016r* 


0163 — т + .523 = 0* 


or 
+1 УТ 4(016)0528) -1-У1- 088172 
ғ ае Чуй 032 » 
= 11+ .9831 
082 


= .53 (taking numerator as + 1 — .9831) 
= + 62 (taking numerator as + 1 + .9831) 


* The general form of a quadratic equation is az? + bz + c = 0, The 
two Males of z (i.e., the roots of the equation) may be computed by the 


form 
—b+ VP — 4ac 
ac ный асан 


In the equation .016r*°—r+.523=0, а=.016; Ь=—1.00; and с=.523. 


Hence, 
6:9 Уз — 4(.016)(.523) 
2 X .016 


= .53 or 62 (an impossible value) 
UNT 
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poorly adjusted socially.* The proportions in each category (p 
and q) are 60% and 40%, respectively. The sums of the columns 
show that fifty-five of the 100 salesmen are classified as unsuccessful, 
and forty-five as successful; the proportions are 55% (q^) and 45% 
(p^). On the assumption that “social adjustment" is distributed 
normally, from the proportions P = .60, and 9 = .40, we obtain an 
x = —253, and = = .386. These last two values are read from 
Table 48 as follows: The perpendicular line (ie., the ordinate, г) 
separating the upper 60% from the lower 40% in a normal curve is 
just 1096 from the mean. Hence, entering the first column of Table 48 
with a=.10, we read х= — 253 and 2 = .386. See diagram 
below. 


poorly 
adjusted 


The x" and 2’ values corresponding to p’ = 45 and д = .55 are 
calculated in the same way. The perpendicular line dividing the 
upper 45% (the percent successful) from the lower 55% (the percent 
unsuccessful) is 5% from the mean; and from Table 48, for a = .05, 
x' = 126 and z' = 396. See diagram on page 365. 

An approximate formula for tetrachoric r may be written as fol- 
lows: 


ad—be _ алата, (91) 


(approximate formula for tetrachoric r) 
in which * 
x and x’ = o-distances from the means to the points separating the 


* To accord with the plan of the ordinary correlation table (p. 128), the cate- 
gories in Table 49 have been so arranged that concentration of data in the 
first and third quadrants (a and d) denotes positive correlation; concentration 
of data in the second and fourth. (b and c) quadrants negative correlation. 
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proportion in the upper eategory from the proportion in 
the lower category; 
z and z'— the heights of the ordinates at the points of division; 
a, b, c, d = entries in the four cells, see Table 49; 
N = number of cases; i.e., sum of entries in the four cells; 
т, = the tetrachoric coefficient of correlation. 


In Table 49, ad is found to equal 1050, and be to equal 250. Substi- 
tuting for these quantities, and for x, х", z, 2’, and N? in formula (91), 
we obtain r, = .53. This coefficient indicates a fairly substantial cor- 
relation between success in salesmanship and social adjustment. In 
order to compute 7; it is necessary that we solve a quadratic equa- 
tion. The method of carrying through this solution is given in 
Table 49 and in the footnote at the bottom of the table. Note that 
only the first of the two solutions for 7у is a possible value, as the 
second is greater than unity. 

The investigator who finds it necessary to calculate many tetra- 
сһогіс r's may greatly shorten his work by using the computing dia- 
grams devised by Thurstone and his co-workers.* These charts 
enable one to obtain a solution for r; by graphie methods as soon as 
the proportion within each of the four cells of the table is known. 


(2) тне SE оғ A TETRACHORIC 7 


The formula for SE,, is mathematically complex and is too long to 
be useful practically. Its derivation can be found in books that deal 
with the mathematics of statistical theory.t If a standard error is 


* Chesire, L., Saffir, M., and Thurstone, L. L., Computing Diagrams for the 
Tetrachoric Correlation Coefficient, University of Chicago Bookstore, 1933. Қ 

+ Peters, С. C., and Van Voorhis, W. R., Statistical Procedures and Their 
Mathematical Bases (New York: McGraw-Hill, 1940), pp. 370-375. 


366 “ STATISTICS IN PSYCHOLOGY AND EDUCATION 


wanted, an approximation to SE,, may be found in the following way. 
When р is close to .50 and N is large, SE,, is about 70% higher than 
the SE of a product-moment r of the same size as т, апа based upon 
the same N. The SE of a product-moment r of .53 is .07 for N — 100. 
Hence, the SE,, of an т, = .53 is approximately .12 (.07 X 1.70). 
The .95 confidence-interval for the true т, is .29 to .77 (i.e., .53 + 1.96 
X 12 or 53 + .24). The obtained r; of .53 is, therefore, indicative of 
a positive r probably as high as .29. 


(3) TETRACHORIC 7 IN TEST EVALUATION 


Tetrachoric r is often used as a means of evaluating a test’s effi- 
ciency in separating two contrasted or “criterion” groups. An exam- 
ple is given in Table 50 (the data are artificial). The problem is to 


TABLE 50 To illustrate the use of tetrachoric r in evaluating a given test 
N= 125 


X-variable 
College Juniors 


Non-Science 
Majors 


"or 
"d 
4 - 53% 


Y-variable 


For p = .59, q = .41 For p’ = .47, q' = .53 
M cio Y = 075" 
z= 389 2 = 398 
1015 — 0288 _ , (— .298) (075): 
(380)(08 PSAL тт сетті Фр 
470 = r — .00972 
ог 0097? — r + 470 =0 


r= blz УТ 40009470) 
Tee hee 


= +1 + .9915 
018 
= 47, or 111 (ап impossible value) 


find whether a test of deductive reasoning (here, a syllogism test) 
will differentiate fifty-nine college juniors majoring in science from 
sixty-six college juniors majoring in literature or languages (non- 
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science). The X-variable is divided into science majors and non-sci- 
ence majors; the Y-variable into those above and those below the 
mean of the test, i.e., the mean score established by the entire junior 
class. Тһе entries in the cells, a, b, c, and d, are expressed in percents, 
so that М? in formula (91) is 1.00. As shown in Table 50, the correla- 
tion between majoring in science and high scores on the syllogism 
test is 47. If one were investigating a number of tests with a view 
toward determining their relative values as indicators of scientific 
aptitude, the worth of each test could be measured in accordance 
with its ability to separate the two criterion groups.* 


3. The phi-coefficient (fourfold point coefficient) 


Ina 2 X 2 fold table, the -coefficient provides a measure of corre- 
lation which is equivalent to r. Like the point bi-serial r, phi meas- 
ures relationship between items when the classification is truly 
dichotomous and is concentrated at two separate points or into two 
distinet elasses. Phi is sometimes used also with continuous varia- 
bles which have been forced into two categories. 

The diagrams below show the same fourfold tables; in the first the 
entries represent frequencies or scores, in the second proportions. 


- + - + 
+ B A A+B + b a ? 
== D с C+D - а с 4 
B+D A+C g ” 
Тһе formula for phi in terms of frequencies is 
ad AD — BC (92) 
МАТЕ ТО) BFD)AF0) 
(phi-coefficient of correlation) 
which expressed in proportions becomes 

— ad — bc (93) 


Vern” 


The phi-coefficient must always be used to determine the signifi- 
“ance of the difference between correlated percents or proportions. 
.12 — .02 


I le, p = ——— 7 
n example (2), page 237, for example, ф 05505 9566 


* The phi-coefficient is also useful here. 
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or 41. In general, ф is lower than the corresponding r;'s and is not 
comparable to them. The phi-coefficient for the data in Table 49, for 
example, is .33 as against an r, of -53; in Table 50 it is .31 as against 
an т, of .47.* 


Phi is related to y? by the equation y? — Уф? or ф = ра Тһе 


significance of a ф may be estimated, therefore, by converting it into 
а X? and determining the significance of y?. $ is valuable when we 
want to know how performance on one item is related to performance 
on another item (see problem 5, p. 375). Phi has proved to be espe- 
cially useful in item analysis,t where the values 1 and 0 are usually 
assigned to answers right and wrong. 


4. The contingency coefficient, C 


The coefficient of contingency, C, is used to determine relationship 
when the variables under study have been put into two or more 
classes or categories. The contingency coefficient can be derived di- 
rectly from x? (p. 254) ; but C differs from 7° in that it provides a 
measure of correlation $ which under certain conditions (p. 371) is 
comparable to product-moment т. C bears the following relation 


to 2: А 
2 
= уту ы 


(formula for C, the contingency coefficient in terms of y?) 


In Table 33, page 263, the association between eyedness and hand- 
edness was found to be expressed by a у of 4.02, which for 4 df was 


not significant. By formula (94) the C for Table 33 i NEEDED. 
y (94) the C for Table 33 is 413-419 


or .10 (to two decimals). Taken at face value and alone, this С 
would indicate a negligible relationship between eyedness and hand- 
edness. The SE needed to test C is a complex expression laborious 
to compute; § so that the significance of С js best tested by its equiv- 
alent 3?. In the present problem, the y? of 4.02 is not significant and 
in consequence our С of .10 is not significant. 

* Guilford, J. P., and Perry, N. C., “Estimation of other coefficients of corre- 
lation from the pu coefficient," Psychometrika, 1951, 16, 335-346. 

1 Guilford, J. P., ed., Printed Classification Tests, Report. No. 5, Army Air 
Forces Aviation Psychology Program Research Reports (Washington, D. C.: 
U. S. Gov't Printing Office, 1947). 

tx? is a measure of probability of association, 


$ Kelley, T. L., Fundamentals of Statistics: (Cambridge, Mass.: Harvard Uni- 
versity Press, 1947). 
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(1) METHOD OF CALCULATING C 


Table 51 illustrates the computation of C from a 4X 4 fold con- 
tingency table. The table gives the classification of 1000 fathers and 
sons with respect to eye color. The independence values for each cell 


TABLE 51 To illustrate the calculation of C, the coefficient of contin- 
gency 


Father's Eye Color П. Calculation of C 
Blue Gray Hazel Brown Totals (194)? = 3136 


(120) (60) 200 
ы BES (ут 41 (83) _ 75 
5 102 қ 
(102) | (75) | (51) | (56) 
OGray ІІ | 4l | 36 | ом Qt _ ma 
>» 
E (49) | (36) | (25) | (27) 
5 аваа | ess 55 | 93 137 So - 360 
d 87) | (64) | (44) | (48) 70)? 
Brown 67 36 43 109 244 си = 55.7 
Totals 358 264 180 198 1000 ам ERU 
I. Independence Values S» - 921 
335 Х 358 137 x 358 а 
1000. 2.120 Еос ame oy = 203 
41) 
LX c 
335 X 180 _ eo 187 X 180 _ 95 ay 33.0 
1000 ” 771000 oe 
335 X 198 _ (6 137 X 198 _ oy “p= 1210 
100 = 1000 аз» 15 
284 X 358 _ ы 24 X 358 _ g7 44-7 2249 
100 ~ ^ 1000 G 156 
T í 
284 X 264 244 X 264 
1097 75 Tooo 764 eo = 231 
284 180 2 
1 (109) 
эсе. 56 18 = 48 `p = 275 
8 = 12708 
N = 1000 
8-М- 2708 
8-М _ [2708 
c= y 8 1270.8 = 46 
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have been computed as shown in Table 51. From the top row, 
for example, we know that 335/1000 of all sons are described as 


blue-eyed. This proportion of 358( i.e., E gives 120 as the 


number of fathers who can be expected to have blue-eyed sons “by 
chance,” as contrasted with the 194 fathers who actually did have 
blue-eyed sons. When the independence values have been found, we 
square each obtained cell entry, and divide by its own independence 
value as shown in Table 51. The sum of these quotients gives S; and 
from S and N, C is calculated by the formula 


S—N 
C= 95 
5 (95) 


(formula for C, coefficient of contingency, calculated directly) 


In Table 51, С is .46. From (94) the y? corresponding to this С is 
268, which for 9 df is highly significant—far beyond the .01 level 
(Table E). 

C possesses certain advantages over ф and т. In computing С, 
for example, no assumption of normality in the distributions of the 
variables classified need be made; in fact any type of distribution, 
skewed or rectangular, may be utilized. C may be either plus or 
minus, the sign of the coefficient depending upon an inspection of the 
contingency table itself. In Table 51 it is clear that pigmentation of 
eyes in father and son is positively correlated * and that C must be 
positive. 

А disadvantage of C is that it does not remain constant for the 
same data when the number of classes varies. The C computed from 
a 2X2 or 3X 3 table will ordinarily not be comparable to the C 
computed for the same data from a 5x 5 table, say. Furthermore, 
the maximum value which C can take depends upon the fineness of 
the classification used so that C is not directly comparable to bi-serial 
ror {о ту. It can be shown that 


when the number of classes — 
when the number of classes = 
when the number of classes = 
when the number of classes = 


2, the maximum С is .707 
3, the maximum С is ‚816 
4, the maximum C is .866 
5, the maximum С is .894 


* We note, for example, that 194 blue-eyed fathers have blue-eyed sons, 
while only 30 brown-eyed fathers have blue-eyed sons. Moreover, 109 brown- 
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when the number of classes = 6, the maximum C is 913 
when the number of classes = 7, the maximum C is 926 
when the number of classes = 8, the maximum C is .935 
when the number of classes = 9, the maximum C is .943 
when the number of classes = 10, the maximum С is 949 


In the light of this table, Yule and Kendall * recommend that we 
"restrict the use of the ‘coefficient of contingency’ to 5 X 5 or finer 
classifications” in order that the maximum value of С may be as near 
unity as possible. At the same time, we should avoid a too-fine classi- 
fication or C will be affected by slight or “casual irregularities of no 
physical significance”; and, in addition, the arithmetic of calculation 
will be greatly (and needlessly) increased. A correction T for “broad 
categories” may be applied to C's calculated from 4X 4 fold or 
broader groupings if C is to be compared with r. For 5X 5 fold or 
finer classifications, this correction is so small that for practical pur- 
poses it may be disregarded. 

Тһе relation of C to r is, under certain conditions, very close. C is 
substantially equivalent to r (1) when the grouping is relatively 
fine—5 X 5 fold or finer; (2) when the sample is large; (3) when the 
two variables may legitimately be classified into categories; and (4) 
when we are justified in assuming that the variables under investiga- 
tion are normally distributed. 


IIl. Curvilinear or Non-Linear Relationship 


The relationship between the paired values of two sets of meas- 
ures, X and Y, may be described in a general way as “linear” or “non- 
linear.” When the means of the arrays of the successive columns and 
Tows in a correlation table follow straight lines (at least approxi- 
mately), the regression is said to be linear or straight-line (р. 138). 
When the drift or trend of the means of the arrays (columns or rows) 
cannot be well described by a straight line, but ean be represented by 
а curve of some kind, the regression is said to be curvilinear or in 
general non-linear. 

Our diseussion in Chapter 6 was concerned entirely with linear 
relationship, the extent or degree of which is measured by the prod-. 
uct-moment coefficient of correlation, r. It sometimes happens in 

* Үше, б. U., and Kendall, M. G., An Introduction to the Theory of Statis- 


tics (12th ed.; London: C. Griffin, 1940). E 
f Peters, C. C., and Van Voorhis, W. R., op. cit., рр. 391-393. 
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mental measurement, however, that the relationship between two 
variables is definitely non-linear; and when this is true, r is not an 
adequate measure of the degree of correspondence or correlation. 
When the regression is non-linear, a curve joining the means of suc- 
cessive arrays (in the columns, say) will fit these mean values more 
exactly than will a straight line. Hence, should a truly curvilinear 
relationship be described by a straight line, the scatter or spread of 
the paired values about the regression line will be greater than the 
scatter about the better-fitting regression curve, The smaller the 
spread of the paired scores about the regression line or the regression 
curve which relates the variables X and Ү (or Ү and Х), the higher 
the relationship between the two variables. For this reason, an r cal- 
culated from a correlation table in which the regression is curvilinear 
will always be less than the true relationship. An example will make 
this situation clearer. Тһе correlation between the following two 
short series, as given by the product-moment formula, is т = .93 
[formula (24), p. 139]. The true correlation between the two series, 


Variable X Variable У 
1 25 
2 50 
8 100 
4 2.00 
5 400 


however, is clearly perfect, since changes in Y are directly related to 
changes in X. As X increases by 1 (ї.е., in arithmetic progression) 
Y doubles (i.e., increases in geometric progression). Тһе reason why 
r is less than 1.00 becomes obvious as soon as we plot the paired X 
and Y values. As shown in Figure 60, the relationship between X 
and Y is eurvilinear, and is exactly described by a curve which 
passes through the successively plotted points. When linear relation- 
ship is forced upon these data, the plotted points do not fall along the 
straight line, and the product-moment coefficient, r, is less than 1.00. 
However, the correlation-ratio, or coefficient of non-linear relation- 
ship | (read as eta) for the given data is 1.00. 

True non-linear relationship is encountered in psychophysies and 
in experiments dealing with fatigue, praetice, forgetting, and learn- 
ing. Whenever an experiment is carried on to the point of diminish- 
ing returns, relationship will necessarily be curvilinear. Most mental 
and educational tests, however, when administered to large samples, 
exhibit linear or approximately linear relationships. The coefficient 
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FIG. 60 To illustrate non-linear relationship 


of correlation, r, therefore, has been employed in psychology and 
education to a far greater extent than has n; and for this reason the 
calculation of is not given here.* If regression is significantly 
non-linear, it makes considerable difference whether n or r is the 
measure of relation. But if the correlation is low and the regression 
not significantly curvilinear, r will give as adequate a measure of 
relationship as 1). 

The coefficient of correlation has the advantage over тү in that 
knowing r we can write down at once the straight-line regression 
equation connecting X and Y or Y and X. This is not possible with 
the correlation ratio. In order to estimate one variable from another 
(say, Y from X) when regression is non-linear, a curve must be fitted 
to the means of the Y-columns. The equation of this curve then 
Serves as a "regression equation" from which estimates can be made.f 


* See references, page 453. 
See Chapter 7. 
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PROBLEMS 


1. Compute the correlation between the following two series of test scores 
by the rank-difference method and test its significance. 


Cancellation Score 
(A-Test -]- Number 


Individual Intelligence Score Group Check- 
ing Test) 
1 185 110 
2 203 98 
3 188 118 
4 195 104 
5 176 112 
6 174 124 
7 158 119 
8 197 95 
9 176 94 
10 138 97 
11 126 110 
12 160 94 
13 151 126 
14 185 120 
15 185 118 


(Note: The cancellation scores are in seconds; hence the two smallest scores 
numerically (1.е., 94) are highest and are ranked 1.5 each.] 


2. Check the product-moment correlations obtained in problems 6 and 7, 
pages 150-151, Chap. 6, by the rank-difference method. 


8. The following data give the distributions of scores on the Thorndike 
Intelligence Examination made by entering college freshmen who pre- 
sented 12 or more recommended units, and entering freshmen who pre- 
sented less than 12 recommended units. Compute bi-serial r by formula 
(87) and test its significance. 


12 or more Less than 12 
Thorndike Scores recommended recommended 
units units 

90-99 6 0 
80-89 19 3 
70-79 31 5 
60-69 58 17 
50-59 40 80 
40-49 18 14 
80-39 9 7 
20-29 5 4 
186 80 
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4. Тһе following data give the distributions of scores on a general educa- 
tional achievement test made by those who answered 50% or more, and 
those who answered less than 5095 of the items in an arithmetic test 
correctly. Compute bi-serial 7 and test its significance. 


Subjects answering Subjects answering 
50%, or more of the less than 50% of the 


дее вв items on arithmetic items on arithmetic 
test correctly test correctly 

185-194 7 0 
175-184 16 0 
165-174 10 6 
155-164 35 15 
145-154 24 40 
135-144 15 26 
125-134 10 13 
115-124 3 5 
105-114 0 5 

120 110 


5. Compute tetrachorie r’s for the following tables and test for signifi- 
cance. 
(1) Relation of alcoholism and health in 811 fathers and sons. Entries 
are expressed as proportions. 


Sons 
Unhealthy Healthy Totals 
E Non-Alcoholic | 243 405 748 
E Alcoholic 102 151 252 
Totals 445 556 1.000 
(2) Correspondence of Yes and No answers to two items of a neurotic 
inventory. 
Question 1 
a No Yes Totals 
Я Yes 83 187 270 
t No 102 93 195 


2 
© Totals 185 280 465 
6. (a) Compute the ¢-coefficients for the two tables in example (10), 
p. 245. Test the significance of ф by method on p. 368. 


(b) Compute the rp,,, for example (4), above. 
(c) Compute ф for the table in 5 (2) above. 
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7. Calculate the coefficient of contingency, С, for the two tables given 


below. 
а) Marriage-Adjustment Score of Husbands 
Very Low Low High Very High Totals 
5, Graduate work 4 9 38 54 105 
РЕ College 20 1 55 99 205 
$2 High School 23 Z 4 51 152 
gm Grade School 11 n 19 51 
Totals 58 87 145 223 513 
(2) Kind of Music Preferred 
English French German Italian Spanish , Totals 
5 English | 32 16 75 47 СЕН "wan 
ai French | 10 67 42 4l 40 | 20 
Sd German 12 23 107 36 22 200 
Z Italian 16 20 44 76 44 200 
Spanish 8 53 30 43 66 200 
"Totals 78 179 298 243 202 1000 


8. Convert the C's in example 7, above, to y?'s and test for significance. 
9. Compute C for example 3, Chapter 6, page 149. 


10. (a) In the following table, compute r by the product-moment method. 
(b) Plot the relationship between X and Y as shown in Figure 60, 
page 373. Is the relation linear? 


ANSWERS 
о = .19; not significant 
Тыв = 34 SE, = 07; significant at .01 level 
Тв = 47 БР, = 07; significant at .01 level 


тою p 


(1) r, — —.09 not significant; SE,, = 06 (approximately) 
(2) 7, = 33 SE,, = 07 (approximately). Significant at .01 level. 
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. (a) ф = 25 іп (a) and ф = .50 (b). Both $'s are significant by 


y? test. 
(5) Tp, = 38 (c) ф = 22 


(D СЕ ра (2) 0 — 40 
- (1) Significant by x? test at .01 level 


(2) Significant by у? test at .01 level 


. С=Л2 
10. 


(а) т= 85 (b) Relationship is non-linear 


>= 
15 


PARTIAL AND MULTIPLE CORRELATION 


+ 


I. The Meaning of Partial and Multiple Correlation 


Partial and multiple correlation represent an important extension 
of the theory and technique of simple or two-variable correlation to 
problems which involve three or more variables. In computing the 
correlation between two sets of scores, it is often desirable to allow 
for the influence of factors which through their common relationship 
to the variables being correlated obscure results or make them diffi- 
eult to interpret. То illustrate, suppose that the correlation between 
intelligence test scores and chronological age in a large group of chil- 
dren, seven to fourteen years old, is .50; that the correlation between 
school achievement and age in the same group is .40; and that the 
correlation between intelligence and school achievement is .70. Since 
intelligence test scores and school achievement both increase with 
age (the correlations are .50 and .40) the correlation between these 
two measures will be raised when age is allowed to vary. The corre- 
lation coefficient of .70, therefore, is not only a measure of the role of 
intelligence in school achievement, but is а measure of the influence 
of intelligence plus the indirect effects of differences in age or matur- 
ity upon school achievement. 

To discover the relationship between intelligence and school 
achievement, uninfluenced by maturity, we must rule out or control. 
the factor of age. This could be accomplished experimentally by 
selecting children all of whom are of the same age. But this proce- 
dure offers many difficulties, the principal one being that it is well- 
nigh impossible to find a large sample of children of exactly the same 
age. It becomes necessary, then, to determine what age range is per- 
missible; and the more closely we limit our group with respect to age, 
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the smaller the number left. In fact, the experimental control of а 
variable by the method of selection may so limit the size of the group 
that correlations are of doubtful value. 

Because of the difficulties which arise in attempting to control а 
variable (or variables) experimentally, the method of partial correla- 
tion is often employed. By this method the relationship between two 
variables can be determined when one or more related variables are 
held constant. Thus, the partial correlation between general intelli- 
gence and school achievement, i.e., the correlation with age “par- 
tialled out," gives us the correlation between these two variables un- 
influenced by the factor of age differences. Such а partial coefficient 
represents the net correlation between general intelligence and school 
achievement for ehildren of the same age; or the net correlation 
between intelligence and school achievement when age is a constant 
factor. Expressed in still another way, our partial coefficient tells us 
what relationship exists between general intelligence test scores and 
school achievement when differences in maturity no longer affect 
either variable. 

А second illustration of partial correlation may be helpful. A 
teacher finds in her class a correlation of .60 between test, scores in 
history and arithmetic. In looking for an explanation of this correla- 
tion (since there is apparently little reason to ezpect a high relation- 
ship between these two abilities), she finds that achievement in arith- 
metic seems to depend in part upon ability to read and understand the 
problems. Obviously, ability to read well is also an important factor 
in determining achievement in history. Suppose that our teacher 
calculates the correlations of history and arithmetic with a third 
test, namely, one of reading comprehension. Knowing these r’s, she 
may determine (by methods given on p. 387) the net or partial corre- 
lation between history and arithmetic when differences in reading 
comprehension have been allowed for. If this partial coefficient is 
130, say—considerably smaller than the “whole” coefficient (of .60) 
between history and arithmetic—the hypothesis that the apparent 
relationship was due in part to the common dependence of both tests 
upon reading is verified. When a factor (or factors) is “partialled 
out” from a given correlation the effect is to eliminate the differences 
among individuals introduced by the variable thus controlled. The 
method of eliminating factor variability through partial correlation 
may be employed whenever the correlation can be computed between 
the factor or factors to be controlled and the two variables the net 
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correlation of which we are seeking. Since all of the data are utilized, 
partial correlation has a decided advantage over experimental con- 
trol in many problems. 

In addition to its value as a means of controlling conditions by 
eliminating the effects of “disturbing” or other variables, partial cor- 
relation is useful in other ways. It enables us, for example, to build 
up a regression equation involving three or more variables from which 
a “criterion” score may be predicted when we know the scores made 
by a subject on several correlated tests. Тһе accuracy of the regres- 
sion equation in estimating criterion scores—its reliability as a “pre- 
diction" instrument—can be determined by the coefficient of multi- 
ple correlation. A multiple correlation coefficient gives the correla- 
tion between a single test or criterion on the one hand and a team of 
tests on the other. The meaning of the multiple coefficient of correla- 
tion will be better understood when the student has worked through 
an actual problem such as that given in Table 52. 


ІІ. An Illustrative Correlation Problem Involving 
Three Variables 


Perhaps the most straightforward approach to an understanding 
of the meaning of partial and multiple correlation, and of the tech- 
niques of calculation involved, is through the solution of a problem. 
The present section, therefore, will show the application of partial 
and multiple correlation to a three-variable problem. Following this, 
the general formulas and further applications of the method will be 
considered. 

The problem in Table 52 is taken from a study * of the factors 
which influence “academic success.” In that part of the study from 
which the present data are drawn, the problem was to discover how 
accurately one can predict the academic success of freshmen from a 
knowledge of their general intelligence and of their study habits. 
Academic success was defined specifically as the number of credit or 
“honor” points obtained by a student at the end of his first semester 
in college. The number of honor points earned depended upon the 
number of A, B, and C grades made by the student in his freshman 
courses. A grade of A carried three honor points; a grade of B two 
honor points; a grade of C one honor point; and a grade of D, which 


* May, M. A., “Predicting Academic Success,” Journal of Educational Psy- 
chology, 1923, 14, 429-440, 
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TABLE 52 A correlation problem involving three variables 
(To illustrate partial and multiple correlation) 


Step 1. Primary Data (N = 450) 
(1) Honor Points (2) General Intelli- (8) Average Hours 


gence of Study per Week 
М; = 18.5 М, = 100.6 M; = 24 
оу = 11.2 оз = 15.8 95 = 6 
тә = .60 тц = .82 Та = — .35 
Step 2. Calculation of Partial Coefficients of Correlation 
Ті — Ті” .60 — .32(— .35) 
ears rr aa ATEX T 2 (90) 
ӘЖ Б Tis — Тыз .,92—.00(—.35) _ т (96) 
ТОМІ = "УТ = ra -8000 X .9367 
Tu — Рата (= .85)— .60 X 82 ,, (96) 


па ш ma 8000 X .0474 
Step 3. The Regression Equations and Partial Regression Coefficients 


Zi biata + Мыз (Deviation Form) (98) 
or Xi = buaXs + bis2Xa + К (Score Form) (99) 
in which bis = тз zm and bia = тар (102) 


Step 4. Calculation of the Partial 075 
(1) б.а = 21 V1 — rn V1 — rua = 11.2 X .8000 X .7042 = 6.3 (97) 
(2) оз = G3 V1 — ra V1 — шз = 15.8 X .9367 X .6000 = 8.9 (97) 
(3) сз = 03 V1 — rin V1 — т% = 6 X .9367 X .7042 = 4.0 (97) 
Step 5. Calculation of the Partial Regression Coefficients, and Partial 
Regression Equation 


Substituting for 71.3, 7із2, 01:25) 023, ба, We have : 
6.3 6. 
bia = .80 X 897 57; ba = 71 X 107 1.12 


Hence the regression equation becomes: 
Ж = .57 + 1.12; (Deviation Form) 


ог X; = .57Х, + 1.12Х; — 66 (Score Form) 


Step 6. Calculation of the Standard Error of Estimate Же 
с.х) = быз = 6.3 


Step 7. Calculation of the Coefficient of Multiple Correlation 


Ве 74 һ = біз (107) 


= 83 


was a passing mark, carried no honor point credit. The maximum 
number of points which a freshman taking the regulation number 
of courses in one semester could obtain was forty-eight. 

General intelligence was measured by a combination of the Miller 
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Mental Ability Test and the Dartmouth Completion of Definitions 
Test. The first test contains 120 items and the second 40, so that the 
maximum score was 160. Тһе scores of the 450 students іп this sam- 
ple ranged from 50 to 150, the distribution being fairly normal. As 
à measure of interest and application it was decided to take the 
average number of hours per week spent in study. Information with 
regard to study habits was obtained by means of a questionnaire 
given at the beginning and again at the middle of the first semester. 
Among other items in the questionnaire upon which information was 
requested were the number of hours spent per week at meals, in 
sleeping, ete. These and other questions were included in order that 
the student might think that he was being checked upon the distribu- 
tion of his total time and not upon his study habits alone. The cor- 
relation between the student's estimates of the number of hours spent 
in study, given on the first and second questionnaires, was .86, indi- 
cating a satisfactory degree of reliability. 

Ав stated above, the main object of this study was to find how 
accurately the number of honor points which a student earns can be 
predicted from a knowledge of his study habits and his general intel- 
ligence. Other factors, of course, such as health, personality, previ- 
ous preparation, and the like, are undoubtedly of importance in de- 
termining the number of honor points received. Тһе two factors 
selected were chosen because they are important and are also objec- 
tive and measurable. As the first step in solving our problem, we 
shall calculate the partial coefficient which shows to what extent 
honor points are related to general intelligence when the variable 
factor of study hours per week is held constant. Next the partial 
coefficient will be ealeulated which shows to what extent honor points 
are related to study hours when the variable effect of general intelli- 
gence is rendered constant. Apart from the employment of these 
partial coefficients in the regression equation from which we predict 
honor points, the information which they yield will prove in itself to 
be of considerable interest. The solution of the problem is outlined in 


the following series of steps; the necessary data and calculations will 
be found in Table 52. ! 


Step | 


Тһе mean and б of each series of measures and the intercorrela- 
tions are first caleulated. These intercorrelations are product- 
moment 778 computed as shown in Chapter 6. The correlation be- 
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tween (1) honor points and (2) general intelligence, written r12, is 
.60; the correlation between (1) honor points and (3) the number of 
hours per week spent on the average in study, written гуз, is .32; and 
the correlation between (2) general intelligence and (3) hours of 
study per week, written rss, is —.35. The low correlation between 
honor points and study hours is of decided interest; but the most 
surprising correlation is the —.35 between study hours and general 
intelligence. Evidently the brighter the student, the less he 
studies. 


Step 2 


Having found the intercorrelations of our three variables, we may 
then calculate the net correlation between (1) honor points and (2) 
general intelligence with the influence of (3) study hours partialled 
out or held constant. This net or partial coefficient of correlation, 
written 7, о.з, is found from the following formula: 


dye sa eura _ 6), page 388 
ұр er vor MM iie 
Substitution of the values for ri», rig, and 7әҙ in the formula gives a 
partial coefficient, 712.3, of .80. This means that if all of our 450 stu- 
dents had studied exaetly the same number of hours per week, the 
coefficient of correlation between honor points earned and general 
intelligence test scores would have been .80 instead of .60. In other 
words, when all students spend the same number of hours in study, 
there is a closer correspondence between general intelligence test 
score and honor points earned than there is when the number of 
study hours varies. 
The partial coefficient of correlation between (1) honor points and 
(3) hours spent in study per week with (2) general intelligence par- 
tialled out, or its influence held constant, is found from the formula 


Tis — T1223 (96) 


па Vi rs ДЕ 


Substitution of the values for ris, 712, and rss gives a partial coeffi- 
cient, 71,2, of .71, as against an obtained coefficient (тіз) of .32. This 
result means that if our group possessed the same general intelli- 
gence * there would be a much closer correspondence between the 


‚ * By “same general intelligence" is meant the same score on the given general 
intelligence tests. 
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number of honor points received and the number of hours spent in 
study than there is when the members of the group possess varying 
degrees of intelligence. This is certainly the result to be expected. 

The last partial coefficient of correlation ros; equals —.72. This 
coefficient gives the net correlation between (2) general intelligence 
and (8) study hours when the influence of (1) honor points is held 
constant. It is found from the formula 


pm (96) 


Like the two partial r's above, we may interpret 23,1 to mean that 
the correlation between general intelligenee and hours spent in study 
in a group in which every student earns the same number of honor 
points would be much higher (in the inverse direction) than the 
"raw" correlation between the same two factors in an unselected 
group. By an unselected group is meant here a group in which the 
number of honor points received by different students varies. It 
seems evident that the brighter student not only studies less than the 
average and dull (since rs; = —.35) but that the brighter the student, 
the less he needs to study in order to reach a given standard of aca- 
demie success—earn a given number of honor points. 


Step 3 


Knowing the partial coefficients of correlation, we may write the 
multiple regression equation from which the most probable number of 
honor points a student will receive may be estimated when we know 
his seore in the general intelligence test and the number of hours he 
studies per week. The regression equation for three variables (in 
deviation form) is as follows: 


Тіл Dio sta + dig ots (98), page 391 


In this equation 2) stands for honor points and is the dependent 
variable or criterion; д» and їз stand for general intelligence and 
study hours, respectively, and are the independent variables. Note 
the resemblance of this equation to the simple regression equation for 
two variables у = б X x (р. 155). If 7, is put for 7j, and za for x in 
the two-variable equation, we have т, = bio X Xo. 

When written in score form, the multiple regression equation for 
three variables becomes 
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(X3 — My) = biss(Xs — М») + biss(Xs — Ms) 
or transposing and collecting terms, 
X, = 0:Х: + bis2X3+ K (a constant) (99), page 391 


It is clear that before we can use this equation we must find the 
value of the partial regression coefficients Бізз and Біз. These may 
be found from the formulas 


bs = тез 22 and Быз = таз 715 (102), page 392 
02.3 05.12 


and, as we already have the values of r12. and 143.2, it is only neces- 
sary that we find оз оз, 62,13, and 63.12 (the partial o's) in order to 
replace the partial regression coefficients in the equation by numeri- 
cal values. 

Note that the partial coefficient of correlation 755.1 although of 
interest as giving us the relation between general intelligence and 
hours spent in study for a constant number of honor points earned, is 
not actually needed in the regression equation 21 = by2.3%2 + bi3,2%3. 
In order to evaluate the constants b123 and б.з, in our regression 
equation, we need only тзә.з and тз. In fact, in any problem in- 
volving three variables, only two partial coefficients of correlation 
need be computed, if we are interested primarily in the prediction of 
X; scores from known values of Хз and Хз. 


Step 4 


Тһе partial o's may be found from the formulas 
тыз = УТ ra V1— rua 
Oan = бзш = OxV1 ra УТ rss (98), page 391 
0312 = 03:1 = 03V 1 а V 1 — Таза 


Substituting the known values of the raw and partial r’s in these for- 
mulas we find that 61,23 = 6.3; 0218 = 8.9; and 03,12 = 4.0. (For the 
calculations see Table 52.) 


Step 5 


From the partial o's and the partial 778 the numerical values of 
the partial regression coefficients b12.3 and b,» are found to be .57 
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and 1.12, respectively. We may now write the multiple regression 
equation in deviation form as 


Tı = 5725 + 1.1223 


In order to write this multiple regression equation in score form 
we replace vı by (X; — 185); т by (Х.- 100.6); and 2з by 
(Xs — 24). Тһе equation then becomes 


X; -.57X,--1.12X, — 66 


Given a student's general intelligence test score (X5) and the num- 
ber of hours per week he spends in study (X3), we can estimate from 
this equation the *most probable" number of honor points he will 
receive during his first semester in college. Suppose that student 
J. N. has a general intelligence test score of 190 and that he studies 
on the average 20 hours per week: how many honor points will he 
then most probably receive during the first semester? Substi- 
tuting X; = 120 and X; = 20 in the regression equation, we find 
that 


X; = (57 X 120) + (112 X 20) — 66 = 25 


Тһе most probable number of honor points which student J. N. will 
receive, therefore, using the given measures as the basis of our fore- 
cast, is 25. 


Step 6 


This forecast, like every other “most probable" number of honor 
points predicted from the regression equation, has an "error of esti- 
mate." The standard error of estimate of any X; predicted from the 
regression equation, X; = bis3X; + bis.2X3-+ К is written сг, хі)» 
and equals б) оз directly (р. 381). 

The standard error of estimate in the present problem is 6.3, and 
in the illustration given above, the twenty-five honor points esti- 
mated for J. N. have a SE est, ху) Of about 6 points. This means 
that the chances are about two in three that our forecast of twenty- 
five honor points will not miss the actual number of honor points 
received by J. N. by more than +6. In general we may say that two- 
thirds of all predicted honor point values will lie within +6 points 
of their actual values. 
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Step 7 


The final step in the solution of our three-variable correlation 
problem is the computation of the coefficient of multiple correlation. 
“Multiple т,” generally written R, is defined (see р. 380) as the co- 
efficient of correlation between scores actually made on the criterion 
test and scores on the same test predicted from the regression equa- 
tion. For the data of Table 52, R gives the correlation between earned 
honor points (X;) and honor points estimated by means of the two 
variables, general intelligence (Хз) and hours of study (Хз), when 
these two are combined into a team by means of the regression equa- 
tion. The formula for R when we are dealing with three variables is 


UE V = oe (106), page 395 


In the present problem Ry 23) = .83. This means that if the most 
probable number of honor points which each student in our group of 
450 will receive is predicted from the regression equation given on 
page 381, the correlation between these 450 predicted scores and the 
450 scores actually received will be 88. Multiple R tells us to what 
extent X, is determined by the combined action of X; and Хз; or, in 
the present instance, to what extent honor points are related to gen- 
eral intelligence together with number of study hours per week. 

The methods described in this section are not practicable when 
there are more than four variables. For multiple correlation prob- 
lems involving a large number of tests it is advisable to use short-cut 
methods to lessen the amount of numerical calculation. An efficient 
and timesaving method is described in Chapter 7 and Appendix A 
of R. L. Thorndike’s Personnel Selection (New York: John Wiley 
and Son, 1949) .* 


IIl. General Formulas for Use in Partial 
and Multiple Correlation 


1. Partial r's of any order 


(1) FORMULAS FOR PARTIAL 7/8 
We found in Table 52 that one is able by the method of partial 
* See also Capter 16. 
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correlation to find the net relationship between two variables when 
the influence of a third is ruled out or held constant. By an exten- 
sion of the partial correlation method, we may obtain the net cor- 
relation between X, and X, when two or more variables have been 
held constant. The partial coefficient of correlation ry2,54, for exam- 
ple, means by analogy to 712.3 that the correlation between X; and X» 
has been freed of the influence of both Хз and X4; and the partial 
coefficient of correlation 7:334..., means that the correlation be- 
tween X, and X; has been freed of the influence of a large number of 
disturbing factors. 

In every partial coefficient of correlation, e.g., 7:4, the primary 
subscripts to the left of the point (1 and 2) define the two variables 
whose net correlation we are seeking. The secondary subscripts to 
the right of the point (3 and 4) denote the variables ruled out or held 
constant. The order in which the secondary subseripts are written is 
immaterial, i.e., ri234 = 71243. The order of the primary subscripts is 
of importance, however, as it tells us which variable is taken to be 
dependent and which independent. The rj; means that X; is de- 
pendent—is to be predicted from Хо; while rs; means that X; is 
dependent—is to be predicted from Ху. The numerical values ri» 
and ra, are, of course, the same. The order of a partial т is deter- 
mined by the number of its secondary subscripts. Thus r;», an 
“entire” or “total” r, is a coefficient of zero order; 712.3 is a partial r 
of the first order; r1» 345 is a coefficient of the third order. 

The general formula for a partial r is 


Том. on 


= TRA... (п) Tind... (п)... . (nt) (96) 


МІ Pins. че) VI fh... 


(partial correlation coefficient in terms of the coefficients 
of lower order—n variables) 


From this formula partial r’s of any given order may be found. In a 
five-variable problem, for example, (n — 1) — 4, and n — 5, so that 
712.345 18 Written 


Тэ. — Ty5.34l25.34 


Tr. = — 
id Vl- ra УІ — т.м 


that is, in terms of the partial r’s of the second order. These second 
order partial r's must then be computed by formula (96) from 7’s of 
the first order before the third order т, r;23,5, can be evaluated. In 
calculating partial 78 Table I may be used to read VI — 7? values. 
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"There are several methods akin to partial correlation which are 
useful in certain special problems. Two of these, part correlation and 
semi-partial correlation, may be mentioned briefly. "These proce- 
dures differ from partial correlation in that they give the net effect 
secured by ruling out the influence of one or more variables from only 
one of the two correlated measures, instead of from both. For exam- 
ple, one may wish to know the relation (semi-partial) between re- 
action time and speed of reading when differences in size of vocabu- 
lary are held constant with respect to reading only. Part correlation 
and semi-partial correlation have not been widely used in mental 
measurement. For a discussion of formulas and for illustrations see 
references below.* 


(2) SIGNIFICANCE OF A PARTIAL Р 


The significance of a partial r (like that of a zero-order т) тау be 
tested against the null hypothesis. We may use either Table 25, 
page 200, or Table J, column headed 2 variables. The degrees of free- 
dom for a partial r are (N — m) where N — number of cases and 
т = number of variables entering into the partial r. Thus if 
712.45 = 40 and N = 75, m = 5 and (N — m) = 75 — 5 or 70. The 
05 and .01 significance levels for this r are .23 and .30. 

In Table 52, тізз —.80, N = 450, m — 3, and (N — m) — 447. 
From Table J, column 2, the r entries by interpolation for N — 447 
are .09 and .12 at the .05 and .01 levels. The probability that the ob- 
tained г; з of .80 arose from fluctuations of sampling is much less 
than .01; and this is true, also, of ris; of .71 and Тозл ОҒ —.72. АП 
three partial r's, in fact, are highly significant. 


2. Partial o's of any order 


(1) GENERAL FORMULAS 


Just as the correlation between two sets of scores can be deter- 
mined when the influence of 1,2,3 . . . n factors is held constant, so 
the variability (c) of a set of scores can be computed when the influ- 
ence of 1,2, 3 . . . n variables is ruled out. As an illustration, con- 
sider 64.25 of Table 52. This partial c gives the variability of X, 
(honor points) freed of the influence upon variability exerted by the 


* Ezekiel, M., Methods of Correlation Analysis (2nd ed.; New York: John 
Wiley and Sons, 1941), p. 213. 4 р 

Dunlap, J. W., and Cureton, E. E., *On the Analysis of Causation," Journal 
sf Educational Psychology, 1930, 21, 657-680. 
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two factors X» (general intelligence) and Хз (study hours per week). 
Тһе general formula for partial o's of any order is 


Сіли... п = 01V 1l — rg М1 — ry V1 — ross (97) 
AN TE 7, aD 94 


(partial o for n variables) 


This formula may be used to compute the net o's in correlation prob- 
lems which involve any number of variables. In a five-variable prob- 
lem, for example, 01.2345 is written 


01:45 = 01V 1 — yp VI — r3 У 1 — "uas М1- "ран 


This partial o is of the fourth order since it has four secondary sub- 
scripts, and the order of a partial o, like the order of a partial r, is 
determined by the number of its secondary subscripts. 

By a simple rearrangement of the secondary subscripts, any higher 
order о may be written in more than one way. A partial о of the 
second order may be written in two ways: for example, бәз which is 
given on page 385 as 


01:53 = 0V 1— rp V1— ri 
may also be written 
Tin = 01V1— rh; V1— rj 
In like manner ¢2,:3 may be written 
(1) сз = 02 V1 — тш М1 — rna 
or 
(2) бал = 02V 1 — rs V1 — rs 
and 7054» may be written 
(1) 0342 = 03V 1 — ra V1— ra 
or 
(2) бл = 03V 1 — r5 V 1 — r3 


These alternate forms of a partial c are useful as a check upon 
arithmetic calculations; also they make unnecessary the calculation 
of unused partial r's. Use of the second forms of 62,13 and 63,12 iN- 
stead of the first (see Table 52 for example), makes it unnecessary 
to compute 723,1 80 far as the partial o’s in the regression equation 
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are concerned. Furthermore, if ғоз з is not wanted for other purposes, 
it need not be caleulated at all (see p. 381). Two partial r’s are all 
that are required in order to write the regression equation of a three- 
variable problem. 


3. Multiple regression equations and partial regression coefficients 


(1) THE MULTIPLE REGRESSION EQUATION FOR ANY NUMBER OF VARI- 
ABLES 
The regression equation which expresses the relationship between 
а single dependent or criterion variable, X;, and any number of inde- 
pendent variables, Xo, Xs, X, . . . X, may be written in deviation 
form as follows: 


Wi ам... nto + bua... nts t H Dinas... (not) tn (98) 
(regression equation, deviation form, for n variables) 


and in score form 
Xi = bau... Аа bsa... Xite tbms.. «Хь +K 
(99) 
(regression equation, score form, for n variables) 


The partial regression coefficient b15.4 . . . », 0154... m ete., give the 
weights to be attached to the scores of each independent variable 
when X; is to be estimated from all of these in combination. Further- 
more, the regression coefficients give the weight which each variable 
exerts in determining X; when the influence of the other variables is 
excluded. Hence, we can tell from the regression equation just what 
role each of the several test variables plays in determining the seore 
on Test 1, the test taken as the criterion. 


(2) THE MULTIPLE REGRESSION EQUATION FOR THREE VARIABLES (SPE- 
CIAL FORM) 
When a problem involves only three variables, the regression equa- 
tion, as we have seen, is written 
Тіл 12.322 + Бз.223 (deviation form) 


If the partial r’s and the partial o's are of no special interest, it is 
possible to express the equation above in a somewhat more con- 
venient form for calculation, as follows: 
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ei (ri — 713793) O1(713 — Tiros) 
X EG UE ARA рса gu аушы (100) 


(regression equation for three variables, special form) 
or in score form 


у. Qin — та?ы) ei(ris — Tiras) K 101 
Mau Mcr) Ot epe) iri c ААО 
(regression equation for three variables, special form) 

As this equation involves only zero order r's and zero order o's, 
X; may be estimated from it without the computation of any partial 
7's or partial o’s. We may illustrate using the data given in Table 52, 
page 381. Substituting for оу = 11.2, oz = 15.8, оз = 6, т. = .60, 


тїз = 82, and тоз = —.35, we have 
11.2(.32 + .60 X .35) 


= ..11(.60 + .32 X .35) 
а= © т 
М ви 35) "€ ed-zm ^ 
T, = Drs + 1.1225 
which checks the regression equation as calculated in Table 52. 


(3) PARTIAL REGRESSION COEFFICIENTS (b's) 
Partial regression coefficients may be computed from the formula 


(ла e ari Т аыл CE (102) 
02384... n 


(partial regression coefficients in terms of partial coefficients of 
correlation and standard errors of estimate—n variables) 


When the problem involves three variables, the regression coefficients, 
biz: and 5, are, like туз and тз з, of the first order. The first re- 


01.23 


gression coefficient, 555, equals 112.3 and the second regression 


02, 
coefficient, 515.5, equals TEM qra 
б 4 03.12 А 
Partial regression coefficients whieh involve more than three vari- 
ables may be caleulated from formula (102). In a five-variable 
problem, for example, the regression coefficients (of the third order) 
are 


01.2245 
ъз = 112,345 — —- 
92.1345 


01. 
= таль — ete. 
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In order to find these partial regression coefficients we first compute 
the third order partial r’s and the fourth order partial o's. 

The b's are determined by the o’s of the tests and these in turn 
depend upon the units in terms of which the test is scored. The 
b-coefficients give the weights of scores in the independent variables, 
Хэ, Xs, etc., but not the contribution of these variables without 
regard to the scoring system employed. The latter contribution is 
given by the “beta weights,” described in (4) below. 


(4) THE BETA (B) соккктстЕхтз 


When expressed in terms of 6-scores, partial regression coefficients 
are usually called beta coefficients, The beta coefficients may be cal- 
culated directly from the b’s as follows: 


6 
бам...р.- зм... ОЕР (108) 
1 
(beta coefficients calculated from partial regression coefficients) 


Тһе multiple regression equation for т variables may also be writ- 
ten in o-scores as 


DE Ваза... n22 + Ваза... nZ Teu Bios... (n—1)2n (104) 
(multiple regression equation in terms of о-всотеѕ) 


Beta coefficients are often called "beta weights" to distinguish them 
from the "score weights" (b's) of the ordinary multiple regres- 
sion equation. When all of our tests have been expressed in 
9-scores (all Means = .00 and all o’s = 1.00) differences in test units 
as well as differences in variability are allowed for. We are then able 
to determine from the correlations alone the relative weight with 
which each independent variable “enters in” or contributes to the 
criterion, independently of the other factors, 

To illustrate with the data in Table 52, we find that Bios 


= 57x 1280г 81 апа that уз» = 112 хт or 60. From (104) 


above we get 
2; = 81z2+ .6025 


This equation should be compared with the multiple regression 
equation ту = 57x + 1.12, in Table 52 which gives the weights to 
be attached to the scores in X» and Xs. The weights of .57 and 1.12 
tell us the amount by which scores іп X» and Ху must be multiplied 
in order to give the “best” prediction of Х,. But these weights do not 


394 * STATISTICS IN PSYCHOLOGY AND EDUCATION 


give us the relative importance of general intelligence and study 
habits in determining the number of honor points а freshman will 
receive. This information is given by the beta weights. It is of inter- 
est to note that while the actual score weights are as 1:2 (.57 to 1.12), 
the independent contributions of general intelligence (гь) and study 
habits (zs) are in the ratio of .81 to .60 or as 4:3. When the variabili- 
ties (o's) of our tests are all equal and scoring units are comparable, 
general intelligence has a proportionately greater influence than 
study habits in determining academic achievement, This is certainly 
the result to be expected. 


4. The standard error of estimate for multiple regression equations 


All X; scores estimated from a multiple regression equation have a 
standard error of estimate which measures the error made in taking 
scores given by the regression equation instead of actual scores (those 
earned on the criterion test). The standard error of estimate is given 
directly by 63,034... » ав follows 


Gest. X) = 91.934... n (105) 


(standard error of estimate for n variables) 


Since 01.234 . . » must be computed in order to evaluate the partial 
regression coefficients (p. 390), б... x,) is always calculated in the 
course of the problem. In Table 52, the (est. ху) Of a prediction of 
honor points is 6.3. Тһе chances are about seven in ten or two in 
three that the *most probable" honor point score forecast for any 
student will be in error by 6 points or less. 

Tt is worth while examining further into the meaning of Gest. хі): 
This standard error of estimate equals 61.23; and the latter indicates 
the effect upon the variability of Test 1 (honor points) obtained by 
eliminating (or holding constant) the influence of Tests 2 and 3 
(general intelligence and study effort). The smaller с; эз is with 
respect to бі, the greater the influence exerted by our two factors 
upon Test 1’s variability. In Table 52 it is clear that in ruling out 
the variability in Test 1 attributable to Tests 2 and 3, we reduce бі 
from 11.2 to 6.3 (0153) or by nearly one-half. This means that stu- 
dents alike in general intelligence and in study habits differ much less 
in scholastic achievement than do students in general. 

From the multiple regression equation Xi = 57X; 1.12%, — 66 
(see p. 381), X; (honor points) can be predicted with a smaller error 
of estimate than from any other linear equation. Put differently, the 
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standard error of estimate is a minimum when the regression equa- 
tion is used to estimate X, scores.* Hence, the values of X, pre- 
dicted from the multiple regression equation are the “best estimates” 
of the actual Ху values which сап be made from a linear equation 
containing the given variables. 


5. The coefficient of multiple correlation, R 


(1) GENERAL FORMULAS 


Тһе correlation between a single dependent or criterion variable 
X; and (n — 1) independent variables combined by means of а mul- 
tiple regression equation is given by the formula 


Pin.. 
Ёз...» =\/1— 3 7 (106) 
(multiple correlation coefficient in terms of partial 
o’s — n variables) 
in which 

Коз... ау = the coefficient of multiple correlation 
бі = the standard deviation of the criterion (X1) scores 
бізз...» = the variability left in Test 1 when the variability of 
Tests 2, 3 ... n is held constant through partial 


correlation. 


When there are only three variables, the multiple coefficient of cor- 
relation becomes 


031. 
Rigs) =\/1——; 
[ oh 
when there are five variables 
07,255 
Riess) = y 1 —2® 
(2345) g^ 


If we replace 61.25...» in formula (106) by its value in terms of 
the entire and partial r's [see formula (97)] we may write the gen- 
eral formula for №. (284... n) as follows: 


Rien... = У1- [(1 — rh) — r3) ... (1 тз... (-))1 
(107) 
(multiple coefficient of correlation in terms of partial coefficients 
of correlation—n variables) 


* Yule, G. U., and Kendall, M. G., An Introduction to the Theory of Statis- 
tics, op. cit., pp. 262-267. 
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Since a higher order c may be written in a variety of ways, the num- 
ber depending upon its order (see p. 389), there are several alternate 
forms for Е. These serve as valuable means of checking the accuracy 
of our arithmetical calculations. In a three-variable problem, for 
example, №; (әз) may be written as 


Ries = VI = ГП = 7533) (1 — 733.2) ] 
Rieg = V1 ГП = r) (1—7553)] > 


The standard error of estimate is a minimum when the multiple 
regression equation is employed in estimating X, scores (р. 395). 
Hence the multiple coefficient of correlation, R, is the maximum cor- 
relation obtainable between actual X, scores and X, scores estimated 
from a knowledge of the variables Xo, Хз... X, in the regression 
equation. The truth of this statement is contingent upon linearity of 
regression in all of the correlations. R indicates how accurately a 
given combination of variables represents the actual values of X; 
(the criterion) when our test scores are combined in accordance with 
the “best” linear equation. 


or as 


(2) MULTIPLE В іх TERMS OF В COEFFICIENTS 


R? may be expressed in terms of the beta coefficients and the zero 
order 7°: 


Rie. „пу = Вам... + Вам. rise ++ Binns.. (а) Tin 
(108) 


(multiple R? in terms of В coefficients and zero order r's) 
For three variables (108) becomes 
Еле) = izari + Біз лгіз 


From page 393 we find әз = .81 and із = 60; and from Table 
52 that тіз = .60 and тз = .32. Substituting in (108) above, we get 


К; (озу = .81 X .60 + .60 X .32 
= 49+ 19 

Rx (03) = .68 

Козу = .82 


2; (оз... я) gives the proportion of the variance of the criterion 
measure (Xi) attributable to the joint action of the variables Xs, 
Ха... Xn. As shown above, R? (23) = .68; and, accordingly, 68% of 
whatever makes freshmen differ in (1) school achievement can be 
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attributed to differences in (2) general intelligence and (3) study 
habits. By means of formula (108) the total contribution of .68 сап 
be broken down further into the independent contributions of general 
intelligence (X5) and study habits (X3). Thus from the equation 
ЕЗ (озу = 49 + 19, we know that 49% is the contribution of general 
intelligence to the variance of honor points, and 19% is the contribu- 
tion of study habits. The remaining 32% of the variance of X, must 
be attributed to factors not measured in our problem. 


(3) THE SIGNIFICANCE or R 


Multiple R is positive,* always less than 1.00, and always greater 
than the correlation coefficients Туз, Туз,... Ti». The significance of an 
В can best be tested, perhaps, against the null hypothesis by means 
of Table J. This table must be entered with the number of variables 
(m) in the problem and with (N — m) degrees of freedom. To illus- 
trate with Table 52, R = .83, N= 450, m —3 and (N — m) = 450—3 
or 447. From the column headed “3” in Table J we read that for 447 
degrees of freedom the R’s at the .05 and 01 levels (by interpolation) 
are .116 and .143. Only once in twenty trials would an R of 116 arise 
by sampling fluctuations on the null hypothesis, and only once in 100 
trials would an of .143 oceur. As our R is very much larger than 
14, it is highly significant. Table J may be used with problems in- 
volving up to nine variables. Suppose that Во) = .526 and 
N = 40. From the column headed “5 variables" in Table J, we find 
that for 40 — 5 or 35 degrees of freedom, the R’s are .482 and .556 at 
the .05 and .01 levels. The obtained R is significant, therefore, at 
the .05, but not at the .01, level. 


6. Factors determining the selection of tests in а battery 


The effectiveness with which the composite score obtained from a 
battery of tests measures the criterion depends (1) upon the inter- 
correlations of the tests in the battery as well as (2) upon the corre- 
lations of these tests with the criterion—their validity coefficients. 
This appears clearly in Table 53 in which the criterion correlation of 
each test is .30, but the intercorrelations of the tests of the battery 
vary from .00 to .60. When the tests are uncorrelated (all criterion 
r’s being .30), an increase in size of the battery from 1 to 9 tests 
raises multiple R from .30 to .90. However, when the intercorrela- 

* Since R is always taken as positive, chance errors are cumulative and may 


be large if the sample is small and the number of variables large. For the cor- 
rection of R for chance errors, see formula (109), page 407. 
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tions of the tests are all .60 and the battery is inereased in size from 
1 to 9 tests, multiple Ё goes from .30 to .37. Even when the number 
of tests in the battery is 20 multiple R is only .38. 


TABLE 53* Effect of intercorrelations on multiple correlation 


Multiple R’s for different numbers of tests, when criterion correlations 
(validities) of all tests are 30, and the intercorrelations are the same and vary 
from .00 to 60. Example: In a battery of 4 tests, all with validities of 30 and 
intercorrelations of .30, multiple R is 44, 


Number of Tests Size of Intercorrelations 
00 10 30 60 
1 30 30 30 30 
2 42 40 37 34 
4 60 53 44 36 
9 90 67 48 37 
20 t 79 52 38 


A single test can add to the validity of a battery by "taking out" 
some of the as yet unmeasured part of the criterion. Such a test will 
show a high r with the criterion but relatively low r’s with the other 
tests in the battery. (See Table 53 and Fig. 61.) Usually it is diff- 
eult to find tests, after the first 4 or 5, which fulfill these require- 
ments. In most group tests of general intelligence where the criterion 
is relatively homogeneous (ability to deal with abstract verbal rela- 
tions, say) the sub-tests of a battery may exhibit high intercorrela- 
tions. This is true to a lesser degree of educational achievement tests 
and of many tests of aptitudes. When the criterion is а complex made 
up of a number of variables (job performance, success in salesman- 
ship, or professional competence) it is easier to find tests of accepta- 
ble validity which will show low relationships with the other tests of 
the battery. But even here the maximum multiple R is often reached 
rather quickly (see p. 407). 


FIG. 61 


* From R. L. Thorndike, Personnel Selection (New York: John Wiley and 
Sons, 1949), р. 191. Қ . 

+ It is mathematically impossible for 20 tests all to correlate 0.30 with some 
measure and still have zero intercorrelations, 
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А test may also add to the validity of a battery by acting as a 
“suppressor” variable. Suppose that Test A correlates .50 with a 
criterion—has good validity—while Test B correlates only .10 with 
the criterion but .60 with Test A. The Р, 23) = .56 despite the low 
validity of Test В. This is because Test В acts as a suppressor— 
takes out some of Test A’s “non-valid” variance, thus raising the 
criterion correlation of the battery.* (See Fig. 62.) The weights of 
these two tests in the regression equation connecting the criterion 
with A and B are .69 and —.31. The negative weight of Test B serves 
to suppress that part of Test A not related to the criterion and thus 
gives a better (more valid) measure of the criterion than can be 
obtained with Test A, alone. 


FIG. 62 


IV. Spurious Correlation 


The correlation between two sets of test scores is said to be spuri- 
ous when it is due in some part, at least, to factors other than those 
which determine performance in the tests themselves. In general, the 
cause of spurious correlation lies in a failure to control conditions; 
and the most usual effect of this lack of control is a "boosting" or 
inflation of the coefficient. Some of the situations which may lead to 
spurious correlation will be given in this section. 


1. Spurious correlation arising from heterogeneity 


We have shown elsewhere (p. 378) how a lack of uniformity in age 
conditions will lead to correlations which are spuriously high. Fail- 
ure to take account of heterogeneity introduced by the age factor is 
a prolific source of error in correlational work. То cite an example, 
within a group of boys ten to eighteen years old, a substantial cor- 


*See also Table 52. Here Tests 2 and 3 take out relatively distinct parts of 
1 (the criterion)—they are negatively correlated—so that Rie» (83) is sig- 
nificantly increased over riz (60). 
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relation will appear between strength of grip and memory span, quite 
apart from any intrinsic relationship, due solely to the fact that both 
variables increase with age. In stating the correlation between two 
tests, or the reliability coefficient of a test, one should always be care- 
ful to specify the range of ages, grades included, and other data 
bearing upon physical, mental, and cultural differences, in order to 
show the degree of heterogeneity in the group. Without this informa- 
tion, the r may be of little value. А 

Heterogeneity is introduced by other factors than age. If alco- 
holism, degeneracy, and bad heredity are all positively related, the r 
between alcoholism and degeneracy will be too high (because of the 
effect of heredity upon both factors) unless heredity ean be "held 
constant." Again, assume that we have measured two distinctly 
different groups, 500 college seniors and 500 day laborers, upon a 
cancellation test and upon a general intelligence test. The mean abil- 
ity in both tests will be definitely higher in the college group. Now 
even if the correlation between the two tests is zero within each group 
taken separately, if the two groups are combined a positive correla- 
tion will appear because of the heterogeneity of the group with re- 
spect to age, intelligence, and educational background. Such a cor- 
relation is, of course, вригїоиз.* 

To be a valid measure of relationship, a correlation coefficient 
must be freed of the extraneous influences which affect the relation- 
ship between the variables concerned. This may be accomplished 
(1) by selecting samples or groups in which age, or whatever the 
factor to be controlled, is constant; or (2) by using partial correla- 
tion when the factor to be controlled can be measured and its corre- 
lation with the variables studied can be calculated. 


2. Spurious index correlation + 


Even when three variables, X;, X», and Хз, are uncorrelated, a 
correlation between the indices Z; and 2; (where 2, = X;/Xz and 
22 = Х»/Хз) may appear which is as large as .50. To illustrate, if 
two individuals observe a series of magnitudes (e.g., Galton bar set- 
tings) independently, the absolute errors of observation (X; and Хо) 


* Garrett, Н. E., and Anastasi, A., “Тһе Tetrad-Difference Criterion and the 
Measurement of Mental Traits,” Annals New York Academy of Sciences, 1932, 
33, 233-282. 

T Yule, G. U., An Introduction to the Theory of Statistics, op. cit. pp. 215- 


Thomson, С. Н, and Pintner, R., “Spurious Correlation and Relationship 
between Tests,” Journal of Educational Psychology, 1924, 15, 433-444. 
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may be uncorrelated, and still an appreciable correlation appear 
between the errors made by the two observers, when these are ex- 
pressed as percents of the observed magnitudes (Ха). The spurious 
element here, of course, is the common factor X; in the denominator 
of the ratios. 

One of the commonest examples of a spurious index relationship 
in psychology is found in the correlation of I.Q.’s or E.Q.’s obtained 
from intelligence and achievement tests. If the I.Q.’s of 500 children 
ranging in age from three to fourteen years are calculated from two 


tests X, and Xo, the correlation is between М апа меа If С.А. 
С.А С.А. 


were а constant (the same for all children) it would have no effect 
on the correlation and we would simply be correlating M.A.'s. But 
when С.А. varies from child to child there is usually a correlation 
between C.A. and M.A. which tends to inerease the r between I.Q.'s 
—sometimes considerably. 


3. Spurious correlation between averages 


Spurious correlation usually results when the average scores made 
by a number of different groups on a given test are correlated against 
the average scores made by the same groups on a second test. An 
example is furnished by the correlations reported between mean intel- 
ligence test scores, by states, and such “educational” factors as num- 
ber of schools, books sold, magazines circulated in the states, etc. 
Most of these correlations are high—many above .90. If average 
correlations by states are compared with the correlations between 
intelligence scores and number of years spent in school within the 
separate states, these latter 778 are usually much lower. Correla- 
tions between averages become "inflated" because a large number 
of factors which ordinarily reduce the correlation within a single 
group eancel out when averages are taken from group to group. 
Average intelligence test, scores, for instance, increase regularly as 
we go up the occupational scale from day laborer to the professions; 
but the correlation between intelligence and status (training, salary, 
ete.) at a given occupational level is far from perfect. 


PROBLEMS 


1. The correlation between a general intelligence test and school achieve- 
ment in a group of children from eight to fourteen years old is .S0. The 
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correlation between the general intelligence test and age in the same 
group is .70; and the correlation between school achievement and age is 
60. What is the correlation between general intelligence and school 
achievement in children of the same age? Comment upon your result. 


In a group of 100 college freshmen, the correlation between (1) intelli- 
gence and (2) the A-cancellation test is .20. Тһе correlation between 
(1) intelligence and (3) а battery of controlled association tests in the 
same group is .70. If the correlation between (2) cancellation and (3) 
controlled association is 45, what is the “net” correlation between intelli- 
gence and cancellation in this group? Between intelligence and con- 
trolled association? Interpret your results. 


Explain why some variables are of such a nature that it is difficult to 
hold them "constant," and hence to employ them in problems involving 
partial correlation. 


Given the following data for fifty-six children: 
X; = Stanford-Binet I.Q. 


Ха = Memory for Objects 
X; - Cube Imitation 


М, = 101.71 M: = 10.06 M; = 3.35 
о = 13.05 оз = 3.06 оз = 2.02 
то = 41 Таз .50 Ts = .16 


(a) Work out the regression equation of X, and Хз upon Xj, using the 
method of Section II. 

(b) Compute №; (зу and OC (est, X1)* 

(c) If a child's score is 12 in Test X, and 4 in Test X 3, what is his most 
probable score in X, (1.0.)? 


. Let X, be a criterion and X; and X, be two other tests. Correlations 


and o's are as follows: 


т = 60 o= 500 
тз = 50 вз = 10.00 
Tog = 20 ds 800 


How much more accurately can X, be predicted from Х and Хз than 
from either alone? 


Given a team of two tests, each of which correlates .50 with a criterion. 

If the two tests correlate .20 

(a) How much would the addition of another test which correlates .50 
with the criterion and .20 with each of the other tests improve the 
predictive value of the team? 

(b) How much would the addition of two such tests improve the pre- 
dictive value of the team? 
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. Test А correlates .60 with a criterion and .50 with Test B, which cor- 
relates only .10 with the criterion. What is the multiple Е of А and В 
with the criterion? Why is it higher than the correlation of A with the 
criterion? 


. Two absolutely independent tests B and C completely determine the 
eriterion A. If B correlates .50 with A, what is the correlation of C and 
A? What is the multiple correlation of A with B and C? 


. Comment upon the following statements: 

(a) It is good practice to correlate E.Q.'s achieved upon two educational 
achievement tests, no matter how wide the age range. 

(b) The positive correlation between average AGCT scores by states 
and the average elevation of the states above sea level proves 
the close relationship of intelligence and geography. 

(c) The correlation between memory test scores and tapping rate in & 
group of 200 eight-year-old children is .20; and the correlation be- 
tween memory test scores and tapping rate in a group of 100 college 
freshmen is .10. When the two groups are combined the correlation 
between these two tests becomes .40. This shows that we must have 
large groups in order to get high correlations. 


ANSWERS 


. r= 87. 


. r (intelligence and cancellation) = —.18; r (intelligence and controlled 
association) = .70 


- (a) X, = 147X; + 2.98X + 76.95 
(b) Е, озу = 60; ба, x,) = 10.93 

(c) 106.50 or 107 

. From X, alone, c(t, х) = 40 

From X; alone, бе, ху) = 43 

From X, and Хз, без, ху) = 35 


. (a) R increases from .64 to .73 
(b) R increases from .64 to .79 


7. Воцаву- 65 


8. Tao — 87; Ва (во) = 1.00 


33 б 


ERN Ty 


16 


. MULTIPLE CORRELATION IN TEST SELECTION. 


+ 


I. The Wherry-Doolittle Test Selection Method * 


The method of solving multiple correlation problems outlined in 
Section II and Table 52 of Chapter 15 is adequate enough when 
there are only three (or not more than four) variables. In problems 
involving more than four variables, however, the mechanics of cal- 
culation become almost prohibitive unless some systematic scheme 
of solution is adopted (p. 387). The Wherry-Doolittle test selection 
method, to be presented in this section, provides a method of solving 
certain types of multiple correlation problems with a minimum of 
statistical labor. This method selects the tests of the battery ana- 
lytically and adds them one at а time until a maximum is obtained. 
To illustrate, suppose we wish to prediet aptitude for a certain tech- 
nical job in a factory. Criterion ratings for job proficiency have been 
obtained and eight tests tried out as possible indicators of job apti- 
tude. By use of the Wherry-Doolittle method we can (1) select those 
tests (e.g., three or four) which yield a maximum R with the criterion 
and discard the rest; (2) calculate the multiple R after the addition 
of each test, stopping the process when R no longer increases; (3) 
compute a multiple regression equation from which the criterion can 
be predicted with the highest precision of which the given list of tests 
is capable. 

Тһе application of the Wherry-Doolittle test selection method to 
an actual problem is shown in Example (1) below. Steps in compu- 
tation are outlined in order and are ilustrated by reference to the 
data of Example (1), so that the student may follow the process in 
detail. 

ж Stead, W. H., Shartle, C. L., et al, Occupational Counseling Techniques, 
ор. cil., Appendix 5. 
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1. Solution of a multiple correlation problem by the Wherry-Doolittle Test 


Selection Method 


Example (1) In Table 54 are presented the intercorrelations 
of ten tests administered in the Minnesota study of Mechanical 


Ability. The criterion—called the “quality” eriterion—was 


а meas- 


ure of the excellence of mechanical work done by 100 junior high- 
school boys. The tests in Table 62 are fairly representative of the 
wide range of measures used in the Minnesota study. Our imme- 
diate problem is to choose from among these variables the most 


valid battery of tests, i.e., those tests which will predict 


the eri- 


terion most efficiently. Selection of tests is made by the Wherry- 


Doolittle method. 


TABLE 54 Intercorrelations of ten tests and a criterion 


(Data from the Minnesota Study of Mechnical Ability *) 


List of Tests (№ = 100) 
C = Quality criterion 
1 — Packing blocks 
2 — Card sorting 1 
3 - Minnesota dps relations boards, A, B, C, D 
4 — Paper form boards, A and B 
5 - Stenquist Picture І 
6 — Stenquist Picture IT 
7 - Minnesota assembly boxes, A, B, с 
8 = Mechanical operations questionnaire 
9 — Interest analysis blank 
10 = Otis intelligence test 


1 2 3 4 5 


26 19 53 52 24 
52 .34 14 18 
-23 14 .0 


8 

S 
REBPRRO 
BB5EREER- 
BsREREBR 


9 
55 
34 
28 
55 
61 
23 
13 
41 
25 


LII 


Steps in the solution of Example (1) may be outlined in order. 


Step | 


Draw up work sheets like those of Tables 55 and 56. The correla- 


tion coefficients between tests and criterion are entered in 


* Paterson, D. G., Elliott, R. M., et al., Minnesota Mechanical 
Minneapolis: The University of Minnesota Press, 1930), Appen 


Table 54. 


Ability Tests 
dix 4, 
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Step 2 


Enter these coefficients with signs reversed in the V, row of 
Table 55.* Тһе numbers heading the columns refer to the tests, 


TABLE 55 


Tests 
1 2 3 4 5 6 7 8 9 710 | 


Vi —.260 —.190 —.530 —,520 —.240 —.310 —.550 —.300 —.550 —.200 | 


Vs -.095 —.118 —.222 —.250 .013 —.090 -.080 —.394 —.188 | 
Vs —.010 —.049 —.097 —.091 099 —.103 —.047 -.061 | 
V. 1005 —034 —.057 054 —.072 —.053 —.056 
Vs -.012 —.039 |  .062 -.065 -.051 —.018 
ve (—.065)* 
А 

Step 3 


Enter the numbers 1.000 in each column of the row Z 1 in Table 56. 


TABLE 56 


Tests 
1 2 8 4 5 6 7 8 9 10 
CLR Pu Ел mA тас ы А ады сал алты жы 0 
24 1.000 1.000 1.000 1.000 1.000 un 1.000 1.000 1.000 1.000 


2а .910 .983 .686 .760 .788 . .840 .832 .983 
з .853 (945 .56% .559 786 .839 831 .854 
2. .839 .931 489 .748 7182 .829 .852 
Zs 17% .927 | 2187 775 829 .637 

1 

532 7 1202 

1 

Beg = 1776 

1 

Tagg = 2.045 

Step 4 


2 
Select that test having the highest = quotient as the first test of 


1 
the battery. From Tables 55 and 56 we find that Tests 7 and 9 both 
have correlations of .550 with the criterion, and that these are the 
* Correlation coefficients are assumed to be accurate to three or to four deci- 


mals in subsequent calculations to avoid the loss of precision which results 
when decimals are rounded to two places. (See p. 20.) 
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largest r's in the table. Either Test 7 or Test 9 could be selected as 
the first test of our battery. We have chosen Test; 7 because it is the 
more objective measure of performance, 


Step 5 
Apply the Wherry shrinkage formula 
iue rf Nd 
R-1 к(1-1) (109) 


in which E is the “shrunken” multiple correlation coefficient, the 
coefficient from which chance error has been removed.* This cor- 
rected R may be calculated in a systematic way as follows: 


(1) Prepare a work sheet similar to that shown in Table 57. 


TABLE 57 
a b с а е f [4 
——————————————B aL 
y. М-і = т Test 

ma К? MN K: Та R # 
0 1.000 (N = 100) 
1 3025 6975 1.000 6975 3025 5500 7 
2 1261 5714 1.010 5771 4229 6503 9 
3 0167 021 5663 4337 65 3 
4 Қ 5481 1.031 5651 A349 6595 4 
5 0054 5427 1.042 5655 A345 6591 6 


(2) Enter 1.000 in column e, row 0, under К?. Enter N = 100 in 


column d. 
y.2 2 (За 2 
(3) Enter the quotient 25 in column b, row 1. = cae 
= 3025 + 


(4) Subtract .3025 from 1.000 to give .6975 as the entry in col- 
шап e under К?, 

(5) Find the quotient 
(N — 1) = 99; and since m (number of tests selected) is 1, 


d N- = 1.000. 

(N - т) 

* Wherry, R. J., “A New Formula for Predicting the Shrinkage of the Coeffi- 
cient of Multiple Correlation," Annals of Mathematical Statistics, 1931, Vol. 2. 
440-451 


+ Quotient is taken to four decimals (р. 406). 


WED and record it in column d. 


(N — m) also equals 99 an 
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(6) Write the product of columns e and d in column e: .6975 < 
1.000 — .6975. 12 

(7) Subtract the column e entry from 1.000 to obtain Rə (the 
shrunken multiple correlation coefficient) in column f. In 
Table 57 the R? entry, of course, is .3025. 


(8) Find the square root of the column f entry and enter the result 
in column g under R. Our entry is .5500, the correlation of 
Test 7 with the criterion. No correction for chance errors is 
necessary for one test. 
Step 6 


To aid in the selection of a second test to be added to our battery 
of one, a work sheet similar to that shown in Table 58 should be pre- 
pared. Caleulations in Table 58 are as follows: 


(1) Leave a; row blank. 

(2) Enter in row Б, the correlations of Test 7 (first selected test) 
with each of the other tests in Table 54. These r's are .300, 
130, .560, etc., and are entered in the columns numbered to 
correspond to the tests. Enter 1.000 in the column for Test 7. 
In column —C enter the correlation of Test 7 with the criterion 
with sign reversed, i.e., as —.550. 

Write the algebraic sum of the b, entries in the “Check Sum” 
column. This sum is 3.730. 

Multiply each b; entry by the negative reciprocal of the bı 
entry for Test 7, the first selected test. Enter these products 
in the e row. Since the negative reciprocal of Test 7's bi 
entry is —1.000, we need simply write the b; entries in the сі 
row with signs reversed. 


(3 


= 


(4 


= 


Step 7 


Draw a vertical line under Test 7 in Table 55 to show that it has 
been selected. To select a second test proceed as follows: 


(1) To each V; entry in Table 55, add algebraically the product 
of the b; entry in the criterion (—C) column of Table 58 by 
the с: entry for each of the other tests. Enter results in the 
Үз row. The formula for V; is V; = У, +b, (criterion) X сі 
(each test). To illustrate, from Table 58 and Table 55 we have 


For Test 1: V, = —.260 + (—.550) X (—.300) = 
—.260 + .165 = —.095 
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For Test 4: Va = —.520 + (—.550) X (—.490) = 
—.520 + .270 = —.250 


For Test 9: V2 = —.550 + (—.550) X (—.410) = 
—.550 + .226 = —.324 


(2) To each Z, in Table 56 add algebraically the product of the b, 
and сі entries for each test got from Table 58. Enter these 
results in the Zə row. The formula is Za = Z, + bı (a given 
test) X cı (same test). To illustrate, from Tables 55 and 58. 
For Test 1: Zs = 1.000 + (300) X (—.300) = 1.000 — .090 

= .910 
For Test 4: Za = 1.000 + (.490) X (—.490) = 1.000 — .240 
= 760 
For Test 9; 25 = 1.000 -+ (.410) X (—.410) = 1.000 — .168 
= 882 
Step 8 


2 
Now select the test having the largest уг quotient, as the second 
2 2 
test for our battery. The quantity Lm is a measure of the amount 


" 2 
which the second test contributes to the squared multiple correlation 
coefficient, Ẹ?. From Tables 55 and 56 we find that Test 9 has the 


Va а (.324)2 
largest — quotient: = = 1261. 
TTE 832 


2 
Step 9 


To calculate the new multiple correlation coefficient when Test 9 
is added to Test 7, proceed as follows: 


(1) The quantity .1261 (= 


oL entered in column b, row 2 of 
Table 57. 


2 
іа 7 
(2) Subtract the ratio 7. from the K? entry in column c, row 1, 


2 5 
and enter the result in column e, row 2; e.g., for the entry in 
column e, row 2, we have .6975 — .1261, or .5714. 

(3) Find the quotient ————— (N —1) . Since N = ү ee m (number 
of tests chosen) = 2, we ыру еШ! (еш сты 1.010, as the 
(N— mu " 98 


column d, row 2 entry. 
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(4) Record the product of the c and d columns in column e: 


5714 X 1.010 = .5771. 


(5) Subtract .5771 (column e) from 1.000 to give .4229 as the 
entry in column f, row 2. 

(6) Take the square root of .4229 and enter the result, .6503, in 
column g. This is the multiple соећсепь R corrected for 
chance errors. It is clear that by adding Test 9 to Test 7 we 
increase R from .5500 to .6503, a substantial gain. 

Step 10 


Since R for Tests 7 and 9 is larger than the correlation for Test 7 
alone, we proceed to add a third test in the hope of further increasing 
the multiple Æ. The procedure is shown in Step 11. 


Step 11 
Return to Table 58 and 
(1) Record in the a; row the correlation coefficient of the second 


(2 


- 


selected test (i.e, Test 9) with each of the other tests and 
with the criterion. (Read 78 from Table 54.) The correlation 
of Test 9 with the criterion is entered with sign reversed (i.e., 
as —.550). 

Enter the algebraie sum of the a» entries (ie., 3.580) in the 
Check Sum column. 


(3) Draw a vertical line down through the bz and со rows for 


(4 


= 


Test 7, the first selected test. This indicates that Test 7 has 
already been chosen. 

Compute the bz entry for each test by adding to the аҙ entry 
the product of the bı entry of the given test by the c; entry 
of the second selected test (ie., Test 9). The formula is 
bs = as +b: (given test) X сі (second selected test). To il- 
lustrate: 


For Test 2: bs = .230 + (.130) (—.410) = .230 — .053 = 


A77 

For Test 6: Б» = .130 + (.400) (—.410) = .130 — .164 = 
—.034 

For Test 10: b2 = .380 + (.130) (—.410) = .380 — .053 = 
1327 


Compute bə entries for criterion and Check Sum column іп 
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TES 
St 


(6 


(7 


= 


= 


the same way. For the criterion column we һауе —.550 
+ (—.550) (—.410) or —.324. For the Check Sum column we 
have 3.580 + (3.730) (—.410) or 2.051. 

There are three checks for the b» row. (a) The entry for the 
second selected test (Test 9) should equal the 2 entry for the 
same test in Table 56. Note that both entries are .832. (b) 
The entry in the criterion column should equal the V, entry of 
the second selected test (Test 9) in Table 55; both entries are 
—.824. (с) The entry in the Check Sum column should equal 
the sum of all of the entries in the ba rows. Adding .217, .177, 
.320, etc., we get 2.051, checking our calculations to the third 
decimal. 

Multiply each b; entry by the negative reciprocal of the bs 
entry for the second selected test (Test 9), and record results 
in the сз row. The negative reciprocal of .832 is —1.202. The 
€» entry for Test 1 is .217 X —1.202 or —.261; for Test 2, 
—.177 X —1.202 or —.213; and so on for the other tests. For 
the criterion column the c; entry is (—.324) X —1.202 or .389; 
and for the Check Sum the cz entry is 2.051 X —1.202 or 
— 2.465. 

There are three checks for the c» entries. (a) The сә row entry 
of the second selected test (Test 9) should be —1.000. (b) The 
сз entry in the Check Sum column should equal the sum of all 
c» entries, Adding the c entries in Table 58, we find the sum 
to be —2.465, the Check Sum entry. (c) The product of the 
be and cə entries in the criterion column should equal the 


2 
quotient E in column b, row 2, of Table 57 in absolute value. 


2 
Note that the product (—.324 X .389) = —.1261, thus check- 
ing our entry (disregard signs). 


Step 12 


Draw a vertical line under Test 9 in Table 55, to indicate that it 
has been selected as our second test. Then proceed as in Step 7 to 
compute Vs and Zz in order to select a third test. The formula for 
Va is Va = Vo+ be (criterion) X cə (each test). The formula for 
Zs is Za = Za + bs (a given test) X c» (same test). The third selected 


; 2 : 
test is that one which has the largest LN quotient in Table 55. This 
` 3 
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is Test 3, for which Уз = —.222 + (—.324) (—.385) or —.097; and 
2 
2з = .686 + (.320) (—.385) = .563. The quotient Га = .0167. 
3 


Step 13 
2 
Entering .0167 e in column b, row 3, of Table 57, follow the 
procedure of Step 9 to get R = .6586. Note that == > LI = 99/97 


or 1.021; and that the new R is larger than the бы ойла for the 
two ed 7 and 9. We include Test 3 in our battery, therefore, and 
proceed to calculate аз, bs, and сз (Table 58), following Step 11, in 
order to select a fourth test. 


Step 14 


Тһе аҙ entries in Table 58 are the correlations of Test 3 with each 
of the other tests including the criterion. The criterion correlation is 
entered in the —C column with a negative sign (i.e., as —.530). 


(1) The formula for bs is bs = as + b; (given test) X су (third se- 
lected test) + b» (given test) X сә (third selected test). To 


illustrate, 

For Test 1: ba = .340+ (.300) (—.560) + (.217) (—.385) 
= .088 

For Test 4: bs = .630 + (.490) (—.560) -+ (.409) (—.385) 
= .199 


Check the bs entries by Step 11 (5). (а) Note that the bg 
entry for the third selected test (Test 3) equals the 2; entry 
for Test 3 in Table 56, namely, .563. (b) The entry in the 
criterion column equals the Уз entry of the third selected test 
(Test 3) in Table 55, i.e., —.097. (c) The Check Sum entry 
(1.161) equals the sum of the entries in the bs row. 

The formula for c is ba X the negative reciprocal of the bg 
entry for the third selected test (Test 3). The negative re- 
ciprocal of .563 is —1.776. To illustrate the calculation for 
Test 5, сз = 146 X —1.776 = —.259. Check the сз entries by 
Step 11 (7). (а) The сз row entry of the third selected 
test (Test 3) equals —1.000. (b) The сз entry in the 
Check Sum column, namely, —2.062, equals the sum of the C3 


(2 
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row. (с) The product of the b; and сз entries in the criterion 
2 
column (namely, —.097 X .172) equals the quotient (697 
3 
(i.e., 0167) in absolute value. 
Step 15 


Repeat Step 12 to find Vy and Z,. The formula for V, is 
V4 = Үз + бз (criterion) X са (each test). Also, the formula 
for Z4 is Zą+bs (a given test) X cs (same test). For Test 4, 
V4 = —.091 + (—.097) (—.353) or —.057; and Z, = .559 + (.199) 


2 = 2 
(—.353) or 489. The quotient, " equals [т\ш ог .0066. While 
4 


none of the V, entries is large, Test 4 has the largest 5 
4 


quotient, and 


2 
hence is selected as our fourth test, Enter 0066 Ve in eolumn b, 


4 - 
row 4, of Table 57. Follow the procedure of Step 9 to get Ё = .6595. 
Note that тєн is 99/96 or 1.031; and that the new R is but 
—m 

slightly larger than the R of .6586 found for the three tests, 7, 9, 
апа 3. When R decreases or fails to increase, there is no point in 
adding new tests to the battery. The increase in R is so small as a 
result of adding Test 4 that it is hardly profitable to enlarge our bat- 
tery by a fifth test. We shall add a fifth test, however, in order to 
illustrate a further step in the selection process. 


Step 16 


To choose a fifth test, calculate as, ba, and сұ, following Step 11, 
and enter the results in Table 58. The a, entries are the correlations 
of the fourth selected test (Test 4) with each of the other tests in- 
cluding the criterion (with sign reversed), 


(1) The formula for b, may readily be written by analogy to the 
formulas for bs and b» as follows: b, = ay +b: (given test) 
X cı (fourth selected test) -+ Б» (given test) X сә (fourth se- 
lected test) + bs (given test) X сз (fourth selected test). To 
illustrate 


For Test 6: b4 = .300 + (400) (—.490) + (—.034) (—.492) 
+ (.179) (—.353) = .058 

For Test 10: b, = .560 + (.130) (—.490) + (.327) (—.492) 
+ (.031) (—.353) = 324 
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Check the b; entries by Step 11 (5). (a) The b, entry for the 
fourth selected test (Test 4) equals the Z, entry for Test 4 in 
Table 56, namely, 489. (b) The entry in the criterion column 
equals the V, entry of the fourth selected test (Test 4), i.e., 
—.057. (с) The Check Sum (.715) equals the sum of the 
entries in the Б; row. 

To find the entries c4, multiply each b, by the negative re- 
ciprocal of the b, entry for the fourth selected test (Test 4). 
The negative reciprocal of .489 is —2.045. To illustrate, 


For Test 1: c, = —.145 X —2.045 = .297. 


(2 


= 


Check the c, entries by Step 11 (7). (a) The c4 row entry of 
the fourth selected test (Test 4) equals —1.000. (b) The cy 
entry in the Check Sum column, namely, —1.462, equals the 
sum of the c, row. (с) The product of the b4 and c4 entries in 
the criterion column (namely, —.057 X 117) equals the quo- 


2 
tient TE (ie, .0066) in absolute value. 
4 


Step 17 


Repeat Step 12 to find V; and Zs. V; = У, ЬЬ, (criterion) X c4 
(each test) ; and 2 = Z4 + by (a given test) X c4 (same test). Test 


6 has the largest (=) quotient (i.e. .0054) and this number is 


entered in column b, row 5, of Table 57. Following Step 9, we get 
R = 6591. This multiple correlation coefficient is smaller than the 
preceding R. We need go no further, therefore, as we have reached 
the point of diminishing returns and the addition of a sixth test will 
not increase the multiple R. It may be noted that four (really three) 
tests constitute a battery which has the highest validity of any com- 
bination of tests chosen from our list of ten. The multiple R between 
the criterion and all ten tests would be somewhat lower—when cor- 
rected for chance error—than the Ё we have found for our battery of 
four tests. The Wherry-Doolittle method not only selects the most 
economical battery but saves a large amount of statistical work. 


2. Calculation of the multiple regression equation for tests selected by 
the Wherry-Doolittle Method 


Steps involved in setting up a multiple regression equation for the 
tests selected in Table 58 may be set down as follows: 
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TABLE 59 
7 9 3 4 =© 
Сі —1.000 — 410 — .560 — 490 .550 
а - 1.000 — .385 — 492 .389 
Сз - 1.000 - .853 .172 
С, — 1.000 .117 
Step | 


Draw up a work sheet like that shown in Table 59. Enter the C 
entries for the four selected tests (namely, 7, 9, 3, and 4) and for the 
eriterion, following the order in which the tests were selected for the 
battery. When equated to zero, each row in Table 59 is an equation 
defining the beta weights. 

For our four tests, the equations are i 

- 1.0008; — .4109; — .56083 — .4908, + .550 = 0 
- 1.00058, — .3858; — .4928, + .389 = 0 
- 1.0008; — .3538, + .172 = 0 
— 1.0008, + .117 =0 
Step 2 


Solve the fourth equation to find f, = .117. 


Step 3 
Substitute for B,— .117 in the third equation to get Bs = .131. 


Step 4 


Substitute for Bs and б, in the second equation to get фо = .280. 
Finally, substitute for Bs, Ba, and fs in the first equation to get 
Br = .305. 


Step 5 


The regression equation for predicting the criterion from the four 
selected tests (7, 9, 3, and 4) may be written in o-score form by 
means of formula (104), page 393, as follows: 

Ze = Вел + Bozo + Bots + Вага 
in which 6; = Basu; Bo = Вәли; Bs = Basn; В: = Bas 
Substituting for the 8's we have 
Ze = 180527 + .2802 + .1312з + 11724. 
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To predict the criterion score of any subject in our group, substitute 
his scores in Tests 7, 9, 3, and 4 (expressed as o-scores) in this 
equation. 


Step 6 


To write the regression equation in score form the B's must be 
transformed into b's by means of formula (103), page 393, as follows: 


е с. с, Te 
=; 8-20; bs = 0:8) u= бе 
The o’s аге the SD’s of the test scores: от of Test 7, oy of Test 9, c, of 


the criterion, etc. In general, b, — igo Bp. 
Op 


Step 7 
The regression equation in score form may now be written 
X, = b:X7 + boXo + Хз + UX + K * (99) page 391 
and the Torx, = TeV 1 — Boros) (33) page 162 


3. Checking the weights and multiple R 


Step | 

The f weights may be checked by formula (108), page 396 іп 
which Ё is expressed in terms of beta coefficients. In the present 
example, we have 


К, (тоза) — Bore + Boreo + Bares + Barca 


in which c equals the criterion and the r’s are the correlations between 
the criterion (с) and the Tests, 7, 9, 3, and 4. Substituting for the r's 
and В'ѕ (computed in the last section) we have 


T?,(4934) = .305 X .550 + .280 X .550 + .131 X .530 + .117 X .520 
= .1678 + .1540 + .0694 + .0608 = .4520 
Ro 7934) = .6723 " 


From R?,(7934) we know that our battery accounts for 45% of the 
variance of the criterion. Also (p. 396) our four tests (7, 9, 3, and 4) 
contribute 17%, 1596, 7%, and 6%, respectively, to the variance of 
the criterion. à 


* This equation is not written for our four tests because means and SD’s are 
not given in Table 54. 
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Step 2 


Тһе R? of .4520 calculated above should equal (1 — К?) when К? 
is taken from column c, row 4, in Table 57. From Table 57 we find 
that 1 — K? = 1 — .5481 or .4519 which checks the R? found above 
—and hence the В weights—very closely. 


Step 3 


It will be noted that the multiple correlation coefficient of .6723 
found above is somewhat larger than the shrunken R of .6595 found 
between the criterion and our battery of four tests in Table 57. The 
multiple correlation coefficient obtained from a sample always tends 
—through the operation of chance errors—to be larger than the cor- 
relation in the population from which the sample was drawn, espe- 
cially when N is small or the number of test variables large. For 
this reason, the calculated R must be “adjusted” in order to give us a 
better estimate of the correlation in the population.* The relation- 
ship of the R, corrected for chance errors, to the R as usually calcu- 
lated, is given by the following equation: 


P= (N —1)R?— (m—1) 
(N — m) 
(relation of R to R corrected for chance errors) 
Substituting 4520 for R?, 99 for (N — 1), 96 for (N — m) and 3 for 
(m — 1), we have from (110) that 


Rr = 99 X 4520-3 0 
96 


(110) 


and 
Е = .6595 (see Table 57) 


The Е of .6595 is the corrected multiple correlation between our cri- 
terion and test battery, or the multiple correlation coefficient esti- 
mated for the population from which our sample was drawn. In 
the present problem, shrinkage in multiple R is quite small 
(.6723 — .6595 = .0128) as the sample is fairly large and there are 
only four tests in the multiple regression equation. 


* Ezekiel, M., Methods of Correlation Analysis, op. cit., 323-324. 


ше те 
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Il. Limitations to the Use of Partial 
and Multiple Correlation 


Certain cautions in the use of partial and multiple correlation may 
be indicated in concluding this chapter. 

(1) In order that partial coefficients of correlation be valid meas- 
ures of relationship, it is necessary that all zero order coefficients be 
computed from data in which the regression is linear. 

(2) Тһе number of cases in a multiple correlation problem should 
be large, especially if there are a number of variables; otherwise the 
coefficients calculated from the data will have little significance. 
Coefficients which are misleadingly high or low may be obtained 
when studies which involve many variables are based on relatively 
few cases. The question of accuracy of computation is also involved. 
A general rule advocated by many workers is that results should be 
carried to as many decimals as there are variables in the problem. 
How strictly this rule is to be followed must depend upon the accu- 
racy of the original measures. 

(3) A serious limitation to a clear-cut interpretation of a partial r 
arises from the fact that most of the tests employed by. psychologists 
probably depend upon a large number of “determiners.” When we 
“partial out” the influence of clear-cut and relatively objective fac- 
tors such as age, height, school grade, etc., we have a reasonably clear 
notion of what the “partials” mean. But when we attempt to render 
variability due to “logical memory” constant by partialling out 
memory test scores from the correlation between general intelligence 
test scores and educational achievement, the result is by no means so 
unequivocal. The abilities determining the scores in general intelli- 
gence and in school achievement undoubtedly overlap the memory 
test in other respects than in the “memory” involved. Partialling out 
a memory test score from the correlation between general intelligence 
and educational achievement, therefore, will render constant the in- 
fluence of many factors not strictly “memory,” i.e., partial out too 
much, 

To illustrate this point again it would be fallacious to interpret 
the partial correlation between reading comprehension and arithme- 
tic, say, with the influence of “general intelligence” partialled out, 
as giving the net relationship between these two variables for a con- 
stant degree of intelligence. Both reading and arithmetic enter with 
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heavy, but unknown, weight into most general intelligence tests; 
hence the partial correlation between these two, for general intelli- 
gence constant, cannot be interpreted in a clear-cut and meaning- 
ful way. 

Partial r's obtained from psychologieal and educational tests, 
though often difficult to interpret, may be used in multiple regression 
equations when the purpose is to determine the relative weight to 
be assigned the various tests of a battery. But we should be cautious 
in attempting to give psychological meaning to such residual, i.e., 
partial, r’s. Several writers have discussed this problem, and should 
be referred to by the investigator who plans to use partial and multi- 
ple correlation extensively.* 

(4) Perhaps the chief limitation to В, the coefficient of multiple 
correlation, is the fact that, since it is always positive, variable 
errors of sampling tend to accumulate and thus make the coefficient 
too large. А correction to be applied to R, when the sample is small 
and the number of variables large, has been given on page 407. This 
correction gives the value which R would most probably take in the 
population from which our sample was drawn. 


PROBLEMS 


1. The following data + were assembled for sixteen large cities (of around 
500,000 inhabitants) in a study of factors making for variation in crime. 


X, (criterion) = crime rate: number known offenses per 1000 inhabi- 

tants 

X, — percentage of male inhabitants 

Х = percentage of male native whites of native parentage 

Ху = percentage of foreign-born males 

X, = number children under five per 1000 married women 
fifteen to forty-four years old 

X; — number Negroes per 100 of population 

X; = number male children of foreign-born parents рег 100 
of population 

X; — number males and females ten years and over, in man- 

2 ufaeturing, рег 100 of population 


* Burks, B. S., “Оп the Inadequacy of the Partial and Multiple Correlation 
Technique," Journal of Educational Psychology, 1926, 17, 532-540. + 

Moore, T. V., “Partial Correlations,” Studies in Psychology and Psychiatry 
from the Catholic University of America, 1932, 3, 1-39. үү 

+ Ogburn, W. Е., “Factors in the Variation of Crime Among Cities,” Journal 
of the American Statistical Association, 1935, 30, 12-34. 
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M, = 199 M, = 492 M, = 228 М» = 102 M, = 481.4 М, 47 
= 79 o= 13 o= 72 o= 46 бі-- 744 б;-- 40 
Мв = 13.1 M; = 217 
Og= 42 o= 43 


Intercorrelations 
1 2 3 4 5 6 7 
с Lu ААН 3 га 5 -м -20 
1 (01 25 —.19 —15 01 22 
2 —.92 —.54 55 —.93 — 30 
3 44 —.68 82 40 
4 — 06 52 74 
5 —.67 —.14 
6 21 


(а) By means of (һе Wherry-Doolittle method select those variables 
which give a maximum correlation with the criterion, 

(b) Work out the regression equation in score form (p. 393) and 
O (est. X,)* 

(c) Determine the independent contribution of each of the selected 
factors to crime rate (to R2). 

(d) Compare R and R. Why is the adjustment fairly large? (see p. 418) 


- (a) What is the probable crime rate (from Problem 1) for a city in 
which X = 15.0, X, = 50%, X; = 6.0 and X, = 20.0? 
(b) Fora city in which X; = 13, X, = 48%, X; = 50 and X4 = 220? 
(c) By how much does the use of multiple reduce O (est, хо? 


. In Problem 4, page 402: 
(a) Work out the regression equation using the Wherry-Doolittle 
method. 
(b) How much shrinkage is there when Р (оз) is corrected for chance 


errors (p. 407)? 


ANSWERS 


- (a) The Ё?з are, for Test 6, 540; for Tests 6 and 1, 674; for Tests 6, 

1, and 5, .713; for Tests 6, 1, 5, and 7, 722. R drobe to 702, when 
Test 4 is added. 

(b) X, = — 42X,-1- 335X, + 82X, — 40Х,- 134.59. 
O (est, x,) = 5.47 

(c) ЕЗ, өре) = 121 + .242-+ 210+ 043. Tests 6, 1, 5, and 7 con- 
tribute 12%, 24%, 21%, and 4%, respectively, 

(d) R= 785; Е = 722; shrinkage is .063. 
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2. (a) 23.53 


(b) 16.05 
(c) From 7.9 to 5.5 or 80% 


(b) Ria) is 59; Rigs) = .60 


cc 


олтоо» 
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TABLE A Fractional parts of the total area (taken as 10,000) under the 
normal probability curve, corresponding to distances on the 
baseline between the mean and successive points laid off from 
the mean in units of standard deviation 


Example: between the mean and a point 1.387 (z = 138) are found 
41.62% of the entire area under the curve. 


Z 9. .01 ..02 .03 .04 05 06 07 (8 00 
0.0 0000 0040 0080 0120 0160 0190 0239 0279 0319 0359 
01 0398 0438 0478 0517 0557 0506 0636 0675 0714 0753 
0.2 0793 0832 0871 0910 0948 0087 1026 1064 1103 1141 
0.8 1179 1217 1255 1293 1331 1308 1406 1443 1480 1517 
0.4 1554 1591 1028 1004 1700 1730 1772 1808 1844 1879 
0.5 1915 1950 1985 2019 2054 2088 2123 2157 2190 2224 
0.6 2257 2201 2324 2357 2389 2422 2454 2486 2517 2549 
07 9580 2611 2642 2673 2704 2734 2764 2794 2823 2852 
08 2881 2930 2967 2905 3023 3051 3078 3106 3133 
09 3159 2186 3212 3238 3204 3290 3315 3340 3365 3389 
10 3413 3438 3401 3485 3508 3531 3577 3509 3621 
11 3643 3665 9686 37087 3729 3749- 3770 3790 3810 3830 
12 3849 3869 3888 3907 3925 3914 3902 3980 3997 4015 
L3 4032 4049 4006 4082 4099 4115 4131 4147 4102 4177 
14 4192 . 4207 4222 4236 4251 4265 4279 4292 4306 4319 
1.5 4332 4345 4357 4370 4383 4394 4406 4418 4429 4441 
16 4452 4463 4474 4484 4495 4505 4515 4525 4535 4545 
17 4554 4564 4573 4582 4501 4500 4608 4616 4625 4633 
18 4641 4649 4604 4071 4678 4686 4693 4099 4706 
19 4713 4719 4726 4732 4738 4744 4750 4756 4761 4767 
2.0 4772 4778 4783 4788 4703 4708 4803 4808 4812 4817 
21 4821 4826 4830 4834 4838 4842 4846 4850 4854 4857 
2.2 4861 4864 4808 4871 4875 4878 4881 4884 4887 4890 
23 4893 4806 4808 4001 4004 4006 4909 4911 4913 4916 
24 4018 4920 4922 4925 4927 4929 4931 4932 4934 4936 
25 4938 4040 4941 4943 4045 4046 4948 4949 4951 4952 
26 4953 4055 4056 4957 4959 4960 4961 4962 4963 4964 
27 4065 4966 4967 4068 4969 4970 4971 4972 4973 4974 
28 4074 4975 4976 4977 4977 4978 4070 4979 4980 4981 
20 4981 4082 4982 4983 4984 4984 4985 4985 4986 4986 
3.0 4986.5 4986.9 4987.4 4987.8 4988.2 4988.6 4988.9 4989.3 4989.7 4990.0 
3.1 4990.3 4990.6 4991.0 4991.3 4991.6 4991.8 4992.1 4992.4 4992,6 4992 9 
3.2 4993.129 

З.З 4995.166 

3.4 4996.631 

3.5 4997.674 

8.6 4998.409 

3.7 4998922 

З.В 4999277 

3.9 4999.519 

4.0 4999.683 

4,5 4999.966 

5.0 4999.997138 
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TABLE В Ordinates of the normal probability curve expressed as frac- 
tional parts of the mean ordinate, y, 


The height of the ordinate erected at the mean can be computed from 


N 1 5 2 
Y = V where \/2x = 2.51 and стел 3989. The height of any other ordi- 


nate, in terms of yo, can be read from the table when one knows the distance 
which the ordinate is from the mean. For example: the height of an ordinate 
а distance of —2.37с from the mean is .06029 Уг. Decimals have been omitted 
in the body of the table. 
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TABLE C Conversion of a Pearson r into a corresponding Fisher's z | 


coefficient 

Lo 2 т 2 T 2 T z т 7 т 2 | 
35 %|0 42/5 6/70 87 | 88 126| 950 183 ІҢ 
20 эж|а  44|56  $3|71 39| 86 129) 955 189 

7  28|42 45| 57 65172 91| 87 133| 960 195 | 
28 2/44 46158 16|73 93| 88 128) 965 201 

9 914 4|9 68 | 74 95| 8 142| 970 209 

30 31145 4(8|40  69|.75 97| 90 147) 975 218 

31 32/46 9|617|76 100| 905 150 | 980 230 ү 
32 33 | 47 51 | 62 43 | .77 1.02 | 910 153 | 985 2.44 

33 3448 52 | 63  74| 78  105| 915 156| 990 265 Т 
и — 35|49 54| 64 76| лә 107| 920 150 | 905 299 || 
35 | 51|465 78| 80 110| 925 162 || 
30 38151 56166  79| 81 113| 930 166 

М 3) 58| 67 81/82 116| 935 170 

38 40/53 50| 68 83| 83 119| 940. 174 

39 4|М 60| 69 85,84 122| 945 178 
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TABLE D Table of f, for use in determining the reliability of statistics 


, Example: When the df are 35 and t = 2.03, the .05 in column 3 means that 5 
times in 100 trials а divergence as large as that obtained may be expected in the 
positive and negative directions, 


Degrees of Probability (P) 
Freedom 0.10 0.05 0.02 0.01 
1 t= 6.34 t = 12.71 t = 3182 t = 63.66 
2 2.92 4.30 6.96 9.92 
3 2.35 3.18 454 5.84 
4 213 2.78 3.75 4.60 
5 2.02 257 3.36 4.03 
6 1.94 245 3.14 3.71 
7 1.90 2.36 3.00 3.50 
8 186 2.31 2.90 8.36 
9 183 226 282 8.25 
10 181 223 2.76 3.17 
11 180 220 272 3.11 
12 178 218 268 3.06 
13 177 216 265 3.01 
14 176 2.14 2.62 2.98 
15 175 243 2.60 2.95 
16 175 212 258 2.92 
17 174 211 2.57 2.90 
18 1.73 2.10 2.55 288 
19 178 2.09 2.54 286 
20 1.72 2.09 253 284 
21 172 2.08 252 283 
22 172 207 2.51 282 
23 171 207 2.50 281 
24 171 2.06 249 2.80 
25 171 2.06 248 2.79 
26 171 2.06 248 2.78 
27 170 2.05 247 2.77 
28 1.70 2.05 247 2.76 
29 1.70 2.04 2.46 2.76 
30 1.70 204 246 275 
35 1.69 2.03 244 2.72 
40 1.68 202 242 271 
45 168 202 241 269 
50 168 2.01 240 268 
60 167 2.00 239 266 
70 167 2.00 238 2.65 
80 1.66 1.99 238 264 
90 1.66 1.99 237 263 
100 166: 198 236 263 
125 166 1.98 2.36 ©) 262 
150 1.66 1.98 235 261 
200 165 197 235 2.60 
300 1.65 197 234 2.59 
400 1.65 1.97 2.34 2.59 
500 1.65 1.96 233 2.59 
1000 1.65 1.96 2.33 2.58 
eo 1.65 1.96 233 2.58 
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TABLE G To facilitate the calculation of T-scores 


The percents refer to the percentage of the tota] fı uency below а 
given score + 1/2 of the frequency. on that Score, rares are 


directly from the given percentages, 
Percent T-score Percent T-score 
0032 10 53.98 51 
0048 11 57.93 52 
207 12 61,79 53 
-011 13 65.54 54 
016 14 69.15 55 
.023 15 72.57 56 
034 16 75,80 57 
048 17 78.81 58 
069 18 81.59 59 
.097 19 84.13 60 
13 86.43 61 
19 21 88.49 62 
.26 22 90.32 63 
35 23 91.92 64 
.47 24 93.32 65 
62 25 94,52 66 
182 26 95,54 67 
1.07 27 96.41 68 
1.39 28 97.13 69 
1.79 29 97.72 70 
2.28 30 98.21 71 
2.87 81 98.61 72 
8.59 82 98.93 78 
4.46 33 99.18 74 
5.48 34 99.38 75 
6.68 35 99.53 76 
8.08 36 99.65 77 
9.68 37 99.74 78 
11.51 38 99.81 79 
13.57 39 99.865 80 
15.87 40 99.903 81 
18.41 41 99.931 82 
21.19 42 99.952 83 
24:20 43 99.966 84 
27.43 44 99.977 EA 
30.85 45 99.984 86 
34.46 46 99.9890 87 
38.21 47 99.9928 88 
42.07 48 99.9952 89 
46.02 49 99.9968 90 
50.00 50 
СЛЕ ГТ ИК ЕРНІ 
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0 1 2 3 4 8 6 7 8 9 10 11 12 13 14 15 
1 270218 196 181 170 100 151 144 137 131 125 120 115 110 106 102 
3 244 207 189 175 165 156 148 141 134 128 122 118 112 108 104 99 
3 228 198 182 170 160 152 144 137 131 125 120 115 110 106 102 97 
4 216 191 177 165 156 148 141 134 128 123 118 113 108 104 100 
5 210 185 172 161 152 145 138 131 126 120 115 111 106 102 
© 199 179 167 157 149 141 135 129 123 118 113 108 104 100 
7 192 174 163 153 145 138 132 126 121 116 111 106 102 
8 186 170 159 150 142 135 128 124 118 113 109 104 100 
9 181 165 155 147 139 133 126 121 116 111 106 102 08 

10 176 161 151 143 136 130 124 119 114 109 104 100 06 
11 171 158 148 140 134 127 122 116 111 107 102 98 94 
12 167 154 145 138 131 125 119 114 109 105 100 90 92 


13 163 151 142 135 128 122 117 112 107 103 
14 159 147 139 132 126 120 115 110 105 101 


18 156 144 136 129 123 118 113 108 103 
16 152 141 134 127 121 116 111 106 101 
17 149 139 131 125 119 113 109 104 
18 146 136 129 122 117 111 106 102 
19 143 133 126 120 114 109 105 100 
30 140 131 124 118 112107 103 
31 137 128 121 116 110 105 101 
22 135 126 119 113 108 103 
33 132 124 117 111 106 101 
34 130 121 115 109 104 100 
25 127 119 113 107 102 
26 125 117 111 105 101 
37 123 115109104 
38 120 113 107 102 
29 118 111 105 100 
30 116109103 98 
31 114 107 101 
32 112105 99 
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99 
97 
% 
9з 
92 


16 
97 94 90 86 82 79 76 72 
95 92 88 84 81 77 74 71 
94 90 86 82 79 76 72 69 
96 92 88 84 81 77 74 71 67 
98 94 90 86 82 70 76 72 69 66 
96 92 88 84 81 77 74 71 68 64 
98 94 90 86 83 79 70 72 09 66 03 
96 92 88 84 81 77 74 71 68 64 61 
М 90 80 83 79 76 73 69 66 63 00 
92 88 85 БІ 78 74 71 68 05 62 59 
90 87 83 79 76 73 69 66 03 60 57 
80 85 81 78 74 71 08 05 02 59 50 
99 94 91 87 83 80 76 73 70 00 63 00 57 54 ^ 
97 03 89 85 81 78 75 71 08 65 02 59 50 53 
95 91 87 83 80 76 73 70 66 63 60 57 54 51 
93 89 85 82 78 75 71 68 05 62 59 56 53 50 
91 87 84 80 77 73 70 67 04 00 57 54 52 49 | 
89 86 82 78 75 72 68 65 02 50 56 53 50 47 
88 84 80 77 73 70 07 04 01 58 55 52 49 46 
86 82 79 75 72 69 65 62 50 56 53 50 47 45 
84 81 77 74 70 67 64 00 58 55 52 49 46 43 
83 79 76 72 69 60 62 50 56 53 50 48 45 42 | 
81 78 74 71 67 04 61 58 55 52 49 46 43 41 І 
80 76 73 09 66 63 00 57 54 51 48 45 42 39 
78 74 71 68 64 61 58 55 52 40 46 43 41 38 
16 73 70 66 63 60 57 54 51 48 45 42 39 37 
75 71 68 65 62 58 55 52 49 46 44 41 38 35 
73 70 67 63 60 57 54 51 48 45 42 39 37 
72 68 05 62 59 56 53 50 47 44 4l 38 
70 67 64 60 57 54 51 48 45 42 40 
69 65 62 50 56 53 50 47 44 41 
67 064 01 58 54 51 48 40 43 
66 63 59 56 53 50 47 44 
8 61 58 55 52 49 46 
63 60 56 53 047, 
61 58 55 52 49 
60 57 54 51 
59.55 52 
57 54 
56 
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AMNEM 2M 3.52 3055 263/25 09/40 q1 A3 a3 Z3 M6 eT eu 05 
a0 M EP 80 BT 94 5148 45 43 40 37 25 32 29 27 24 21 19 16 14 11 00 08 OF On 
3 0 SL 58 55 52 50 47 44 41 39 36 33 31 28 25 25 20 18 15 13 10 08 0r or 
100 & 0057 54 51 48 45 43 40 37 35 22 29 27 24 2I 19 16 14 11 09 06 OF 
4 04 OI 58 55 52 50 47 44 41 30 36 33 31 28 25 23 20 18 15 13 10 08 OF 
E6900 57 54 51 48 45 43 40 37 35 32 29 27-24 21 19 16 14 11 09 06 
$ 61 58 55 53 50 47 44 41 39 36 33 31 28 25 23 20 18 15 13 10 08 
T GO 57 54 51 48 45 43 40 37 35 32 20 27 24 21 19 16 14 11 09 
85855 52 50 47 44 41 39 36 33 31 28 25 23 20 18 15 13 10 
9 57 54 51 48 46 43 40 37 35 32 20 27 24 21 19 16 14 11 
10 56 53 50 47 44 41 30 36 33 31 28 25 23 20 18 15 13 
її 54 51 48 46 43 40 37 35 32 29 27 24 22 19 16 14 
12 53 50 47 44 41 39 30 33 31 28 25 23 20 18 15 
18 51 48 46 43 40 37 35 32 29 27 24 22 19 16 
14 50 47 44 42 39 30 33 31 28 25 23 20 18 
18 49 46 43 40 37 35 32 29 27 24 22 19 
16 47 44 42 39 30 33 31 28 26 23 20 
17 46 43 40 37 35 32 29 27 24 22 
18 44 42 39 36 33 31 28 26 23 
19 43 40 38 35 32 30 27 24 
10 42 39 36 34 31 28 26 
21 40 38 35 32 30 27 
22 39 36 34 31 28 
23 38 35 32 30 
34 36 34 31 
25 35 32 
26 34 


TABLE H Mean o-distances from the mean, of various percents of a 
normal distribution 


26 x 20-- (—.13 х 10) 
30 


or .13c (20% lie to the right of mean and 1076 to left, see page 320). 
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TABLE | A table to infer the value of /1—7 from a given value of r 


T У1-я т мі-т т vi-n | 
.0000 1.0000 .3400 .9404 -6800 .7832 
01 35 9367 69 7238 
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TABLE J Coefficients of correl 
the 1% level for vary 


Degrees | 
of 
Freedom 2 
1 .997 
1.000 
2 -950 
-990 
3 878 
+959 
4 811 
917 
5 754 
.874 
6 -707 
834 
7 .666 
-798 
8 .632 
+765 
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lation significant at the 5% level and at 
ing degrees of freedom 


Number of Variables 
4 5 6 7 9 
-999 | .999 | 1.000 | 1.000 | 1.000 
1.000 | 1.000 | 1000 | 100 | 1:000 
-983 | .987 990 | .992 | .994 
-997 | .998 | .998 | :998 | 999 
950 | .961 | .968 973 | .979 
-983 | .987 | .990 991 | .993 |“ 
9312 | .930 | .942 | .950 | 961 
-962 | .970 | .975 | .979 | 984 
874 | .898 | .914 | .925 | 941 
-937 | .949 | .957 | .963 | 0971 
-839 | .867 | .886 | .900 | ооо 
-911 | .927 | .938 | ‘946 | 967 
‘807 | .838 | .860 | .876 | 900 
885 | .904 | .918 | .998 | 2942 
777 | .81 | .835 | .854 | 880 
-860 | .882 | .898 | .909 | 926 
150 | .786 | .812 | .832 | 861 
-836 | .861 | .878 | .891 | 0911 
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TABLE J—(Continued) 


,Number of Variables 


TABLE J—{Continued) 
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CELA ICH 


8 8 


1000 
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Number of Variables 


ТАВГЕ ОЕ SQUARES AND SQUARE ROOTS 
OF THE NUMBERS FROM І ТО 1000 
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TABLE or SQUARES AND SquARE Roots оғ THE NUMBERS FROM 1 то 1000 


Number Square Square Root Number Square Square Root 
1 1 1.000 51 26 01 1.141 
2 4 1.414 52 27 04 7.211 
3 9 1.732 53 28 09 7.280 
4 16 2.000 54 29 16 7.348 
5 25 2.236 55 30 25 7.416 
6 36 2.449 56 3136 8 
7 49 2.646 57 3249 7.55 
8 64 2.828 58 33 64 7.616 
9 8l 3.000 59 34 81 7.€81 
10 100 3.162 60 36 00 7.746 
1 121 3.317 61 37 21 7.810 
12 144 3.464 62 38 44 7.874 
13 169 3.606 63 39 69 7.937 
14 196 3.742 64 40 96 8.000 
15 225 3.873 65 4225 8 062 
16 256 4.000 66 43 56 8.124 
17 289 4.123 67 4459 8.185 
18 324 4.243 68 46 24 8.246 
19 361 4.359 69 47 61 8.307 
20 400 4.472 70 49 00 8.307 
21 441 4.583 71 5041 8.426 
22 484 1.690 72 5184 8.485 
23 529 4.796 5329 8.544 
24 576 4.899 74 54 76 8.602 
25 625 5.000 75 56 25 8.660 
26 676 5.099 76 5776 8.718 
27 729 5.196 77 59 29 8.775 
28 784 5.292 78 60 84 8.832 
29 841 5.385 79 6241 8.888 
30 900 5.477 80 64 00 8.944 
81 961 5.568 81 65 61 9.000 
32 1024 5.657 82 67 24 9.055 
33 10 89 5.745 83 68 89 9.110 
34 1156 5.831 B4 70 56 9.165 
35 1225 5.916 B5 7225 9.220 
36 1296 6.000 86 73 96 9.274 
37 13 69 6.083 87 75 69 9.327 
38 1444 6.164 88 7744 9.381 
39 1521 6.245 89 79 21 9.434 
40 16 00 6.325 90 8100 9.487 
41 1681 6.403 B 8281 9.539 
42 17 64 6.481 8464 9.592 
43 1849 6.557 өз 86 49 9.644 
44 19 36 6.633 94 8836 9.695 
45 25 6.708 95 9025 9.747 

46 2116 6.782 96 9216 9.798 
pot AT 2209 6.856 97 94 09 9.849 
48 223 04 6.928 98 96 04 9 899 
49 2401 7.000 99 98 01 9 950 
50 2500 7.071 100 10000 10 000 


TABLE ОҒ SQUARES AND SQUARE ROOTS + 443 


Square Square Root Number Square Square Root 
10201 10.050 151 22801 12.288 
104 04 10.100 152 23104 12.329 
10609 10.149 153 23409 12.369 
10816 10.198 154 23716 12.410 
11025 10.247 155 24025 12.450 
11236 10.296 156 24336 12.490 
11449 10.344 157 24649 12.530 
11664 10.392 158 24964 12.570 
11881 10.440 159 25281 12.610 
12100 10.488 160 25600 12.649 
12321 10.536 161 25921 12.689 
1 25 44 10.583 162 2 62 44 12.728 
127,69 ` 10.630 163 2 65 69 12.767 
129 96 10.677 164 268 96 12.806 
13225 10.724 165 27225 12.845 
13456 10.770 166 275 56 12.884 
136 89 10.817 167 27889 12.923 
13924 10.863 168 28224 12.961 
14161 10.909 169 28561 13.000 
14400 10.954 170 28900 13.038 
146 41 11.000 171 29241 13.077 
14884 11.045 172 29584 13.115 
15129 11.091 173 29929 13.153 
15376 11.136 174 30276 13.191 
15625 11.180 175 3 06 25 13.229 
15876 11.225 176 30976 13.266 
16129 11.269 177 3 13 29 13.304 
16384 11.314 178 31684 13. 
16641 11.358 179 32041 13.379 
1 69 00 11.402 180 3 24 00 13.416 
17161 11.446 181 32761 13.454 
17424 11.489 182 33124 13.491 
17689 11.533 183 33489 13.528 
17956 11.576 184 33856 13.565 
18225 11.619 185 34225 13.601 
18496 11.662 186 34596 13.638 
18769 11.705 187 34969 13.675 
190 44 11.747 188 8 53 44 13.711 
19321 11.790 189 35721 13.748 
1 96 00 11.832 190 36100 13.784 
19881 11.874 191 36481 13.820 
20164 11.916 192 36864 13.856 
20449 11.958 193 37249 13.892 
20736 12.000 194 37636 13.928 
21025 12.042 195 38025 13. 
21316 12.083 196 38416 14.000 
21609 12.124 197 38809 14.036 
21904 12.166 198 89204 9 14.071 
22201 12.207 199 39601 14.107 
22500 12.247 200 40000 14.142 
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TABLE оғ Squares AND Square Roors—Continued 
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TABLE OF SQUARES AND SQUARE ROOTS * 445 


TABLE or Squares AND Square Roors—Continued 
Number Square 
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Square Root 
18.735 
18.762 
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Number Square Square Root Number Square Square Root 

801 64 16 01 28.302 851 72 4201 29.172 

802 64 32 04 28.320 852 72 5904 29.189 

803 64 48 09 28.337 853 72 76 09 29.206 

804 64 64 16 28.355 854 72 93 16 29.223 ' 
805 64 8025 28.373 855 73 1025 29.240 

806 649636 28.390 856 732736 29.257 

807 65 12 49 28.408 857 73 44 49 29.275 

808 65 28 64 28.425 858 73 61 64 29.292 

809 65 44 81 28.443 859 73 7881 29.309 

810 65 61 00 28.460 860 73 96 00 29.826 

811 657721 28.478 861 741321 29.343 

812 65 9344 28.496 862 743044 29.360 

813 66 09 69 28.513 74 47 69 29.377 

814 66 25 96 28.531 864 74 64 96 29.394 

815 66 42 25 28.548 805 74 82 25 29.411 

816 66 58 56 28.566 866 74 99 56 29.428 

817 66 74 89 28.583 807 75 16 89 29.445 

818 66 91 24 28.601 868 753424 29.462 

819 67 07 61 28.618 869 755161 29.479 

820 67 2400 28.636 870 75 69 00 29.496 

821 67 40 41 28.653 871 75 86 41 29.513 

822 67 56 84 28.671 872 76 03 84 530 

823 67 73 29 28.688 873 76 2129 29.547 

824 67 89 76 28.705 874 76 38 76 563 

825 68 06 25 28.723 875 76 56 25 29.580 

826 682276 28.740 876 76 73 76 29.597 

827 68 39 29 28.758 877 76 91 29 29.614 

828 68 55 84 28.775 878 77 08 84 29.631 

829 68 72 41 28.792 879 77 26 41 29.648 

830 68 89 00 28.810 850 77 4400 29.665 

831 5905 61 28.827 881 77 61 61 29.682 

832 692224 28.844 882 777924 29.698 

833 69 38 89 28.862 883 77 96 89 29.715 

834 69 55 56 28.879 884 781456 29.732 

835 697225 28.896 885 783225 29.749 

836 69 8896 28.914 886 78 49 96 29.766 

837 70 05 69 28.931 887 78 67 69 29.783 o і 
838 70 22 44 28.948 888 78 85 44 29.799 

839 70 39 21 28.965 889 79 03 21 29.816 

840 70 56 00 28.983 890 792100 29.833 

841 707281 29.000 891 79 38 81 29 | 
842 70 89 64 29.017 892 79 56 64 29.866 ! 
843 710649 29.034 893 797449 29.883 

844 712336 29.052 8 79 92 36 29.900 | 
845 714025 29.069 895 80 10 25 29.916 

846 71 57 16 29.086 896 80 28 16 20.933 р 
847 717409 29.103 897 8046 09 29.950 

848 719104 29.120 898 80 34 04 29.967 

849 7208 01 29.138 899 808201 29.983 1 
850 72 25 00 29.155 900 81 00 00 30.000 . 
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Number Square 
951 90 44 01 
952 90 63 04 
953 90 82 09 
954 9101 16 
955 912025 
956 913936 
957 91 58 49 
958 917764 
959 91 96 81 
960 92 16 00 
961 92 35 21 
962 92 54 44 
963 92 73 69 
964 92 92 96 
965 931225 
966 93 31 56 
907 93 50 89 
968 93 70 24 
969 93 89 61 
970 09 00 
971 94 28 41 
972 94 47 84 
973 94 67 29 
974 94 86 76 
975 06 25 
976 952576 
977 95 45 29 
978 956484 
979 95 8441 
980 96 04 00 
981 96 23 61 
982 96 43 24 
983 96 62 89 
984 96 82 56 
985 97 0225 
986 97 2196 
987 97 4169 
988 97 6144 
989 97 8121 
990 98 01 00 
991 982081 
992 98 4064 
993 98 6049 
994 98 8036 
995 99 00 25 
996 992016 
997 99 40 09 
998 99 60 04 
999 99 8001 

1000 100 00 00 
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INDEX 


Accuracy, Standards of, in computa- 
tion, 20-24 

Ackerson, L., 340 

Actuarial prediotion, through correla- 
tion, 164 

Adkins, D. C., 303, 351 

Analysis of variance: principles of, 
268-273; how variances are ana- 
lyzed, 269-273; in determining sig- 
nificance of difference between inde- 
pendent means, 273-284; between 
correlated means, 285-296 

Anastasi, А., 346, 400 

Anderson, J. E., 350 

Arkin, H., 454 

Array, in a correlation table, 130 

Attenuation: correction of correlation 
coefficient for, 346-347; assumptions 
underlying, 347 

Average: definition of, 28; of correla- 
tion coefficients, 146-147. See also 
Mean, Median, and Mode. 


Bar diagram, 80-82 

Barlow’s Tables, 454 

Beta coefficients: in partial and multi- 
ple correlation, 393-394, 396-397; as 
“weights,” 393; calculation of, in 
Wherry-Doolittle method, 415-417 

Bias in sampling. See Sampling 

Binomial expansion: use in probabil- 
ity, 87-92; graphic representation 
of, 91 

Bi-serial correlation, 356-362; calcu- 
lation of fos, 357-359; SE of rei, 
359; alternate formula for, 360-361 ; 
point bi-serial coefficient, 361-362 

Brigham, C. C., 214 

Burks, B. S., 420 


Central tendency, measures of, 28. 
See also Mean, Median, and Mode 

Chesire, L., 365 

Chi-square test, 254; as a measure of 


divergence from the null hypothesis, 
255-257, and from the normal dis- 
tribution, 257-258; when table en- 
tries are small, 258-261; when table 
entries are in percentages, 261-262; 
in contingency tables, 262-265; ad. 
ditive property of, 265 
Classification of measures into a fre- 
quency distribution, 4-9 
Class-interval: definition of, 6-8; 
methods of expressing, 7-8; mid- 
point of, 7-8; limits of, 7-8 
Clayton, B., 340 
Coefficient: of variation, or V, 57-60; 
of alienation, 174-176; of determina- 
tion, in the interpretation of 7, 
176-178 
Coefficient of correlation: meaning of, 
126-134; as a ratio, 126-128; repre- 
sented graphically, 131; computa- 
tion of, deviations from assumed 
means, 134-139; computation of, 
deviations from means, 139-142; 
computation of, deviations from 
zero, 142-145; averaging of, 146- 
147; effect of variability upon, 166- 
167; interpretations of, 172-178; re- 
liability of, 197-201 
Colton, R., 454 
Column diagram. See Histogram 
Comparison: of obtained distribution 
with normal probability curve, 101- 
103; of groups in terms of overlap- 
ping, 107-108. See also Chi-square, 
Skewness, and Kurtosis 
Computation, rules for, 20-24 
Confidence-intervals Фог the 
mean, meaning of, 187-189 
Conrad, H. S., 176 
Contingency, coefficient of (C), 368- 
371; relation of C to chi-square, 
368; methods of computing C, 369- 
370; comparison of C with r, зт 
Continuous series: definition of, 2-3; 


желе 


true 
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scores in, 3-4; tabulation of meas- 
ures in, 4-9 

Correlation, linear, 122, 131-134; posi- 
tive, negative, and zero, 122-124; 
expressed as a ratio, 126-127; con- 
struction of table, 128-130; graphic 
representation of, 131-134; product- 
moment method іп, 134-139; from 
ungrouped data, 139-146; difference 
formula in, 145-146; effect of errors 
of observation upon, 346-347; rank 
difference method of computing, 
353-356; spurious, 399-401. See also 
Partial correlation and Multiple 
correlation 

Correlation-ratio (eta), іп non-linear 
relationship, 372 

Covariance, analysis of, 289-295 

Criterion: value of, in determining 
the validity of tests, 345-346; pre- 
dietion of by multiple regression 
equation, 391-394 

Critical ratio, definition of, 215. See 
also t-test 

Cumulative frequencies, method of 
computing, 63-64 

Cumulative frequency graph: con- 
struction of, 63-65; smoothing of, 
76-77 

Cureton, E. E., 389 

Curvilinear relationship, 371-373 


Data, continuous and discrete, 2-4” 

Davis, F. В., 351 

Deciles, See Percentiles 

Degrees of freedom: meaning of, 193- 
194; in analysis of variance, 278, 
283, 287 

Deviation. See Quartile deviation, 
Mean deviation, and Standard devi- 
ation 

Differences, significance of: between 
means, 213-232; between medians, 
232; between standard deviations, 
232-236; between percentages, 236- 

: 289; between r's, 239-240. See also 
Standard erro and Probable error 

Discrete series, 2 

Distribution, frequency. See 
quency distribution 

Dunlap, J. W., 196, 389 


Fre- 


Edgerton, Н. A., 161 
Edwards, A. L., 268, 299, 453 


Elliott, В. M., 405 
Equivalent groups, method of, 228- 
230 


Error, curve of, 85-87. See also Nor- 
mal curve 

Errors: of sampling, 201-208; con- 
stant, 209. See also Probable and 
Standard errors 

Experimental hypotheses: testing of, 
247-254; null hypothesis, 247-248 

Ezekiel, M., 389, 418 J 


Ferguson, С. А., 350 

Fertig, J. W., 298 

Fiduciary limits (Fisher), 189; prob- 
ability, 189 

Fisher, R. A., 189, 198, 203, 249, 270, 
428, 453, 454 

Flanagan, J. C., 351 

Franzen, R., 59 

Frequency distribution: construction 
of, 4-8; graphical representation of, 
9-20; normalizing a, 307-311; rec- 
tangular and normal, 313-315 

Frequency polygon: construction of, 
11-12; smoothing of, 14-16; com- 
parison with histogram, 18, 20; 
comparison of two, on same axes, 
18, 19 

Froelich, G. J., 336 

F-test: in comparing two g's, 233- 
234; in analysis of variance, 280- 
281 


Garrett, H. E., 186, 280, 400 

Goulden, C. H., 270 

Graphic representation: principles of, 
9-10; of correlation coefficient, 131. 
See also Frequency polygon, Histo- 
gram, Cumulative frequency graph, 
Percentile curve or Ogive, Line 
graph, Bar diagram 

Grouping: in tabulating a frequency 
distribution, 4-9; assumptions in, 
8-9 

Guilford, J. P., 351, 368, 453 

Gulliksen, H., 348, 351 


Hartshorne, H., 236 
Hawkes, Lindquist, and Mann, 115, 
351 


Heterogeneity, effect of: upon corre- 
lation, 166-167; upon the reliability 
coefficient, 344-345 


Hillegas, M. B., 317 

Histogram: definition of, 16-17; com- 
parison of, with frequency polygon, 
18, 20 

Holtzman, W. H., 189, 224 

Holtzinger, K. J., 340 

Homogeneity, 43; effect of, upon cor- 
relation, 166-167 

Hull, C. L, 117, 324 


Inferences, étrors in, 219-222 

Interaction, in analysis of variance, 
287 

Interval. See Class-interval 

Item analysis: problem of, 349; and 
selection, 349; and difficulty of, 350; 
and validity, 350-351 


Jackson, J. D., 340 
Johnson, P. O., 241, 453 
Jones, D. C., 94 

Jones, H. Е, 173 
Jones, L. V., 217 


Kelley, T. L., 99, 342, 368 

Kelly, E. L., 340 

Kendall, M. G., 371, 395 

Kuder, G. F., 335 

Kurtosis: calculation of, 
standard error of, 242-243 

Kurtz, A. K., 196 


100-101 ; 


Levels of confidence, 186-187 
Lewis, D., 190, 254 

Likert, R., 319 

Lindquist, E. F., 270, 453 
Line graphs, 78-80 

Long, J. A., 351, 361 


Martin, G. B., 176 

Matched groups, method of, 230-233 

May, M. А., 236, 380 

McCall, W. A., 308 

MeNemar, Q., 85, 94, 219, 238, 208, 
294, 343, 453. 

Mean, arithmetie: calculation of, 
from ungrouped scores, 28, from 
frequency distribution, 29-31, by 
“assumed mean” method, 36-39; 
when to use, 39; reliability of, 182- 
185; limits of accuracy for, 186- 
187 

Mean deviation, or MD: calculation 
of, from ungrouped data, 48-49; 
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from grouped data, 49-50; when to 
use, 61 

Median: calculation of, from un- 
grouped scores, 31-32; from fre- 
quency distribution, 32-34; in spe- 
cial cases, 34-35; when to use, 39; 
reliability of, 194 

Merrill, M. A., 170, 343 

Method: single group, 225-228; equiv- 
alent groups, 228-230; matched 
groups, 230-232 

Midpoint of interval, as representative 
of all of the scores on the interval, 
7-8 

Mode: calculation of, 35-36; when to 
use, 40 


' Mode, Е. B., 453 


Moore, Т. V., 420 

Morgan, J. J. B., 248 

Moving average, use of in smoothing 
à curve, 14-16 

Multiple coefficient of correlation, R, 
380; computation of, in a three- 
variable problem, 387; formulas for, 
395-397; beta coefficients in, 3%- 
397; significance of, 397; "shrink- 
age" in, 407; limitations to use of, 
419-420 

Multiple regression equations: for n 
variables, 391; for three variables 
(special form), 391-393; partial re- 
gression coefficients (b), 392-393; 
beta coefficients, 393-394 


Non-linear relationship, measurement 
of, 371-373 

Normal probability curve, 85-87; il- 
lustrations of, 85-86; deduction 
from binomial expansion, 90-92; in 
psychological Measurement, 92-94; 
equation of, 94; properties of, 94- 
96; constants of, 94, 97; comparison 
of obtained distribution with, 101- 
103; use in solution of a variety of 
problems, 103-113; in sealing test 
Scores, 323-326; in scaling judg- 
ments, 326-327 9 

Normality: divergence of frequency 
distribution from, 113-118; normal- 
izing а frequency distribution, 307- 
311; T-scores, 307-313 

Null hypothesis: in determining sig- 
nificance of coefficient of correlation, 
199-201; in testing reliability of dif- 
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ferences, 213; advantages of, 246- 
247; testing of, against direct deter- 
mination of probable outcomes, 
248-251; testing of, against normal 
curve frequencies, 251-253 

Numbers: rounded, 20-21; exact and 
approximate, 22 


Ogburn, W. F., 420 

Ogive: construetion of, 69-71; per- 
centiles and percentile ranks from, 
70-75; uses of, 73-75; smoothing of, 


76-77 
Order of merit, ranks, 323-327; 
changing into numerical scores, 
table for, 324 
Otis, A. S., 334 


Overlapping, in the measurement of 
groups, 107-108 


Parallel forms method, in reliability 
of test, scores, 333-334 

Parameter, definition of, 181 

Partial correlation: value of, in analy- 
sis, 378-379; illustrations of, in a 
three-variable problem, 380-387 ; no- 
tation in, 387-388; formulas for par- 
tial 778, 387-389; significance of, ‚389; 
limitations to the use of, 419-420 

Paterson, D. G., 315, 405 

Pearson, K., 128 

Percentages: standard error of, 196; 
standard error of the difference be- 
tween two, 237-238 

Percentile, ranks (PR): computation 
of, 68-69; construction of curve of, 
69-73; graphic method of finding 
ranks, 71-73; uses of curve of, 73- 
75; norms, 75-77; scale, use of, in 
combining test scores, 313-315; 
scale, disadvantages of, 315 

Percentiles: calculation of, 66-69 ; 
graphic method of finding, 70-74 

Perry, №. C., 368 

Peters, C. C., 168, 365, 371 

Phi-coefficient, calculation of, 367- 
368; relation to x?, 368 

Pintner, R., 315, 400 

Predictions: accuracy of, from regres- 
sion equations, 161-163; accuracy of 
group, 163-166; “regression effect” 
in, 171-172; from multiple regres- 
sion equations, 386-387, 394-395 


Probability, elementary principles of, 
87-92 

Probable error: relation to Q, 47; re- 
lation to o, 97 

Product-moment method of finding r, 
134-139 


Quartile deviation (0): calculation of, 
44-48; when to use, 61; reliability 
of, 195 E 

Quartiles, Ф, and Qs, computation of, 
44-48 


Range, as a measure of variability, 
44; when to use, 60-61; influence 
upon the coefficient of correlation, 
166-167 

Rank-difference method of computing 
correlations, 354-356; when to use, 
356 


Ranks, transmutation of, into units of 
amount, 323-327 

Rational equivalence, method of, in 
test reliability, 335-337 

Rectangular distribution, and normal, 
313-315 

Regression 
partial 
392-394 

Regression effect, reasons for, 171- 
172 

Regression equations, 151-154; in de- 
viation form, 154-157; in correla- 
tion table, 157-158; in score form, 
159-160; value of, in prediction and 
control, 160-161; limitations to use 
of, 162-166; formulas for, in partial 
and multiple correlation, 391-394 

Relative variability, coefficient of, 57- 
60. See also Coefficient of vari- 
ation 

Reliability: meaning of, 180-183; of 
the mean, 182-185; in small sam- 
ples, 189-193; of the. median, 194; 
of 0,195; оѓо, 195; of a percentage, 
196-197; sampling and reliability, 
201-209; of differences, independent 
means, 213-216, 222-225; of differ- 
ences, correlated means, 225-232; of 
test scores, 332-344; index of, 341- 
342; dependence of coefficient of, 
upon the size and variability of the 
group, 344 

Remmers, H. H., 340 


154-156; in 
correlation, 


coefficient, 
and multiple 
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Rhine, J. B., 248 

Richardson, M. W., 335, 351, 361 
Rider, P. R., 453 

Ruch, С. M., 340 

Russell, J. T., 165 


Saffir, M., 365 

Sampling: random, 202-205; strati- 
fied, 205-206; incidental, 206; pur- 
posive, 207; size of, 207-208; and 
errors of*measurement, 208; bias 
and constant errors in, 209 

Sandiford, P., 351, 361 

Scale, definition of, 1 

Scaling: of test items (o-scaling), 
301-305; of total scores, 305-307 ; of 
judgments, 316-318; of answers to 
à questionnaire, 319-322; of ratings, 
322-323. See also Percentile scale, 
T-scale 

Scatter diagram, 128-129 

Scores: definition of, 1; in continuous 
and in discrete series, 2-3 

Selection of tests in a battery, factors 
in, 397-399 

Semi-interquartile range, 44-48. See 
also Quartile deviation 

Shartle, C. L., 161, 174, 345, 404 

Shock, N. W., 340 

Sigma scores, and standard scores, 
305-307 

Significance: meaning of, 212; levels 
of, 216-217; two- and dne-tailed 
tests of, 217-219; table for deter- 
mining, 427; .05 and .01 tables of, 
for т, 437-439 

Significant figures, 21 . 

Skewness: measurement of, 97-99; 
causes of, 114-118; standard error of 
measure of, 241-242 

Snedecor, G. W., 193, 270, 453 

Spearman-Brown prophecy formula in 
test reliability, 339-341 

Split-half method, in reliability of test 
Scores, 3. 

Spurious correlation, 399; arising from 
heterogeneity, 399-400; of indices, 
400-401; of averages, 401 

Stalnaker, J. L., 361 

Standard deviation or o, 50; calcula- 
tion of, 51-52; calculation of, by 
Short Method, 52-54; caleulation of, 
Írom raw scores, 54-56; in special 
cases, 56-57; when to use, 61; reli- 
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ability of, 194-195; estimation of 
true value of, 347-348; formulas for, 
in partial correlation, 389-391 

Standard error, of a mean, in large 
samples, 182; in small samples, 190; 
of a median, 194; of о, 195; of Q, 
195; of a percentage, 196; of r, 197; 
of the difference between means, 
213-232; of the difference between 
medians, 232; of the difference be- 
tween 7's, 239 

Standard error of an obtained score, 
342-343 

Standard error, of estimate, 161-163, 
in the interpretation of r, 174-175; 
in partial and multiple correlation, 
394-395 

Standard scores, 305-307; compared 
with 7-scores, 312-313 

Statistic, definition of, 181 

Stead, W. H., 161, 174, 345, 404 

Student's distribution, table of, 427 

Symonds, P. M., 145 


Tabulation: of measures in a fre- 
quency distribution, 4-9; in a cor- 
relation table, 128-130 

Taylor, H. C., 165 

Terman, L. М., 170, 343 

Test items: relative difficulty of, 
302-305; analysis of, 349-350 

Test-retest method, in reliability of 
test scores, 333 

Test scores, factors affecting reliabil- 
ity of, 337-341 

Tetrachoric correlations, 362; calcu- 
lation of, 362-365; diagrams in, 365; 
SE of, 365-366 ; use of, in test eval- 
uation, 366-367 

Thomson, G. H., 400 

Thorndike, E. L., 86, 115 

Thorndike, R. L., 171, 387, 398 

Thurstone, L. L., 59, 316, 365 

Transmutation of measures, 316-327; 
of judgments, 316-323; of orders of 
merit, 323-327 

Treloar, A. E., 219, 4530 

T-scale, 307-312; comparison with 
standard scores, 312-313; advan- 
tages of, 313 

t-test, meaning of, 190-192; compari- 
son with CR, 223; in analysis of 
variance, 275, 283-284, 285; table of 
t (Table D), 427 


460 * INDEX 


Validity: relation of, to reliability, 
344; measurement of, in a test, 344— 
349; in terms of criteria, 345-346; 
indirect measures of, 345-346; of 
test battery, 348-349 

Van Voorhis, W. R., 168, 365, 371 

Variability: meaning of, 42; measures 
of, 43; coefficient of relative vari- 
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